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ABSTRACT 



This thesis has two parts, both related to the develop- 
ment of smart sensor systems. The first part is a theoretical 
development of two families of adaptive spatial filters for 
suppressing background clutters in infrared images and based 
on the minimization of mean- squared error or the maximization 
of signal to noise ratio criterion. Seven different nonlinear 
search techniques have been developed for the adaptation pro- 
cess. They have been applied to two real world infrared test 
images and exhibit fast convergence rate with no misadjust- 
ment. The second part is an experimental development of a 
multiple microcomputer system which can be a candidate for an 
on-board processor system. A multiple star, multiple cluster 
architecture was developed whose intercommunication is managed 
by a three level control including central controller, dis- 
tributed controller and random priority controller. The 
adaptive spatial filter has been successfully implemented on 
this system using partitioning for parallel computing. 
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I. INTRODUCTION 



A. OBJECTIVES 

1 . Dual Objectives of this Thesis 

This thesis consists of two closely related studies. 

a. The first study is the theoretical development 
of adaptive image processing algorithms for enhancement of 
"target signal" to "clutter noise" ratio in images. It will 
be used in the first step of a multiple-stage image process- 
ing program for detection of dim targets in noisy infrared 
images . 

b. The second study is an experimental development 

of a multiple microcomputer system for implementation of these 
adaptive image processing algorithms. 

These two studies belong to two different technical 
areas. Either topic could be the subject of one thesis pro- 
ject. However, they are investigated together in this thesis 
because of the special nature of a new emerging field which 
inspired the research undertaken by this project. This new 
field is sometimes known as the "Smart Sensors" [1, 2, 3]. 

Its developments got into high gear only in the late 1970' s 
when advances in two integrated circuit fields, VLSI digital 
electronics and mosaic optical sensor arrays, were joined 
together to develop new optical sensors which also have 
sophisticated on-board signal/data processing capabilities. 
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In other words, they are SMART -SENSORS. Their importance is 
closely associated with the coexistence of "sensing" and 
"processing" capabilities on a small volume, light weight, 
low power platform. Therefore, the successful development 
of "smart sensor" systems includes not only new signal/data 
processing algorithms to provide the needed "smartness" but 
also efficient implementation by signal/data processors whose 
size, weight, power and performance are compatible with the 
requirements of on-board equipment in many practical military 
systems . 

2 . Multi-Dimensional "Smart Sensor" Signal Processing 

In most optical smart sensor systems, signals of 
interest are in the form of images. If the field of view 
of the sensor platform is nat stabilized, or locked onto a 
target, successive frames of images are not registered. 

Signal processing can only use single frames of an image. 
Therefore, the signal is two dimensional in terms of the 
spatial variables x and y. If sensors in several spectral 
bands are available and well registered spatially the sig- 
nals are three dimensional in terms of variables, x, y and X. 

In many other smart sensor systems, the field of 
view of the sensor platform either does not change (as in 
a synchronous orbit satellite with staring sensors) or is 
stabilized (as in aircraft' with step-staring sensors) or 
is locked onto a target (as in missiles after they have al- 
ready acquired a target). In these cases, successive images 
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are registered. Both single frames of images and multiple 
frames of images are available for signal processing. The 
signal is then three dimensional in terms of x, y and t. In 
addition, if multi-spectral sensors are registered, the signal 
is four dimensional in terms of x, y, t and A. 

Therefore, signal processing operations required for 
smart sensors are often multi-dimensional. This thesis is 
concerned with adaptive spatial filters processing infrared 
images. This type of spatial filter should be distinguished 
from the majority of image processing methods which are con- 
cerned with the image itself as the signal of interest. 

Our primary goal is concentrated in the targets. The image 
itself, often called the background clutter, is considered 
as noise and must be suppressed so that dim target signals 
can be revealed to allow the application of a threshold to 
initiate the detection process. In addition to the clutter, 
the image may include other noise and man-made interference 
and jamming also, which are all treated as noise. Only 
targets are considered as signals. 

3 . Multiple Stages "Smart Sensor" Signal Processing 

To accomplish the objectives of most smart sensor 
systems in detecting, tracking and recognizing very dim 
targets deeply buried in noise, a multiple stage image pro- 
cessing approach is generally needed (Table I.l). 
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TABLE I.l 



IMAGE PROCESSING STAGES 



Objective in 
Various Stages 


Processing 


Enhancement 


Pre- threshold 


Hard Limiting 
Adaptive Filtering 


Detection 


Threshold 


Adaptive threshold 






Target Acquisition 


Tracking 


Post- threshold 


Kalman Tracker 


Recognition 




Target Recognition 



For more detail, see Chapter III.B.2. 

This thesis will concentrate on the development of 
new adaptive filter techniques which will be used in the 
"Enhancement" stage to improve the "target signal" to 
"background clutter noise" ratio by either suppressing the 
background clutter or enhancing the target signal, or both. 

B. STATISTICAL IMAGE PROCESSING TECHNIQUES FOR 
ENHANCEMENT OF "TARGET SIGNAL" TO "BACKGROUND 
NOISE" RATIO IN INFRARED IMAGES 

1 . Introduction 

Although the responsibility of detecting very dim 
targets is shared by several steps of image processing in 
pre- threshold, threshold and post- threshold stages, the "en- 
hancement" step before thresholding plays a very important 
role because it is necessary to improve the "target signal" 
to "clutter noise" ratio to approximately one before a 
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threshold operation can be applied. Otherwise, there will 
be too many false alarms collected by the thresholding step, 
which makes post- threshold signal processing difficult. 
Therefore, in theoretical developments of new image process- 
ing techniques for smart sensors, a great deal of attention 
is given to background clutter suppression techniques for 
enhancement of the signal to noise ratio before the threshold- 
ing step. 

We have made a survey of these techniques and present 
them in several classifications in Table 1.2. First, they 
are classified as nonadaptive, open loop adaptive and closed 
loop adaptive. By "nonadaptive,” we refer to those approaches 
whose filters are not designed by using the image character- 
istics. However, in two adaptive cases, the filters are 
tailor- designed based on the characteristic learned from the 
images being processed. In the open loop adaptive case, the 
filter is not able to update or correct itself when the char- 
acteristics of the image are changed. The image properties 
must be "relearned" before a redesign of the filter can be 
made. In the closed loop adaptive case, a feedback process 
is provided between the filter output and the input to the 
design process. In this way, any change in the image char- 
acteristics will result in an increase of the output error 
which is used to automatically update and correct the filter 
design. 
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TABLE 1. 2 



FOCAL plane processing TECHNIQUES FOR BACKGROUND CLUTTER SUPPRESSION 



FOCAL PLANE PROCESSING ALGORITHMS 


ACTIVE GROUPS 






SPATIAL 


1st order, End order (Laplacian) 
4th order nonrecursive spatial 
f liter 


MIT Lincoln* 
Laboratory 




NONAOAPTIVE 


DETERMINISTIC 


TEMPORAL 


Frame to frame differcncinq: 
(Nonrecursive temporal filter) 

1 1st and End differencing 

» 3rd differencing 


Grumman 

Rockwell 

Hughes 










Three dimensional spatial -temporal 
filter by variational method 


Rockwell 








SPATIAL- 

TEMPORAL 


Pseudo-reticle nonrecursive spatial 
filter followed by recursive tempo- 
ral bandpass or highpass filter 


Optical 

Science 








SPATIAL- 

SPECTRAL 


Nonrecursive spatial filter followed 
by two color discrimination 


MIT Lincoln* 
Laooratory 








SPATIAL 


Background normalization 
(Localized adaotive threshold) 


General* 

Electric 






DETERMINISTIC 


TEMPORAL 


Bandpass filter followed by adaptive 
threshold 


Aerojet * 

El ectroSys terns 








End, 3rd order recursive temporal 
highpass filter 


Rockwel 1 










Minimization of mean square error: 






OPEN LOOP 
ADAPTIVE 






t Recursive Kalman filter (spatial) 


Grumman^ 

NPGS 






SPATIAL 


i Nonrecursive Wiener fil ter(soatial } 


Lockheed 


NPGS 






Maximization of signal to noise ratio: 
Nonrecursive spatial match filter 


MIT Lincoln* 
Laboratory 


NPGS 








Maximization of Likelihood ratio 


Aerospace Corp 






STATISTICAL 




Minimization of mean square error: 










TEMPORAL 


J Nonrecursive temooral Wiener filter 


Lockheed 


NPGS 








( Recursive temporal Kalman filter 












Maximization of signal to noise ratio 


Hughes 


NPGS 






SPATIAL- 


Minimization of mean square error 




NPGS 






TEMPORAL 


Maximization of sicnal to noise ratio 




NPGS 


Cl I nnp 




SPATIAL 


Minimization of mean square error: 
Nonrecursive spatial filter 




NPGS 


ADAPTIVE 


STATISTICAL 




•^aximi zati cn of sicnal to rois® ritio 




NPGS 




TFMPOSAL 


Minimization of mean souare error 




NPGS ■ 






iwnru^v^u 


Maximization of signal to noise ratio 




NPGS , 



• Techniques developed for tactical systems 
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These approaches are further classified as determin- 



istic and statistical. In deterministic cases, the filter 
design is based on non- statistical properties of the image, 
such as its frequency characteristics. In statistical cases, 
the filter design is based on statistical properties of the 
image, such as its autocorrelation or power spectral density. 

Furthermore, they are classified according to the 
types of signal processing operations used: spatial, tempo- 

ral, spectral or some of their combinations. 

2 . Open Loop Adaptive Filter 

In our research group, several nonrecursive adaptive 
open loop adaptive filters have been developed. D. Bar 
Yehoshua [4] first developed the nonrecursive statistical 
spatial filters designed by a minimization of mean squared 
error criterion using theoretically generated images based 
on both the first and second order Markov models. These 
images are all assumed to have zero mean. D. Hilmers [5] 
extended these spatial filters to process real world images 
which have non- zero mean. Further, he extended the same con- 
cept to nonrecursive statistical temporal filters. B. Evenor 
[6] made two additional extensions. First, he developed the 
design procedures for spatial filters based on the maximiza- 
tion of signal to noise ratio. Second, he developed a closed 
loop adaptive spatial filter by extending the LMS (least mean 
square) algorithm used by many one dimensional adaptive filter 
researchers. It will be discussed further in the next section. 
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Using several real world infrared test images, these 
open loop adaptive filters have been found to be very effective 
in suppressing background clutter for point targets. However, 
they are not responsive to any change in the characteristics 
of the image being processed. 

3 . Closed Loop Adaptive Filter and this Thesis 

The realization of this lack of true adaptive capabil- 
ity led to the study of B. Evenor [6] who developed the non- 
recursive closed loop adaptive spatial filter based on the 
"LMS” algorithm, and tested this approach by theoretically 
generated image using Markov models. However, it was dis- 
covered that the LMS algorithm is actually a simplified version 
of a more general and powerful family of closed loop adaptive 
filters. It was decided that the first part of this thesis 
would be to develop such a general adaptive filter approach 
which includes: 

- Two optimization criteria: 

Minimization of mean square error 
Maximization of signal to noise ratio 

- General adaptation equation using gradient search 
models 

- A family of nonlinear searching techniques to carry 
out the adaptation process. 

The details of this theoretical study will be presented in 
Chapter II. 
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C, IMPLEMENTATION OF THE IMAGE PROCESSING PROGRAM 

BY A MULTIPLE MICROCOMPUTER SYSTEM 

1. Introduction 

A parallel effort has been made in the investigation 
of practical implementation of these statistical nonadaptive 
image processing algorithms developed in our research group. 
G. Hilimitzas [7] first investigated the execution speed and 
accuracy of these image processing algorithms on a main frame 
computer, IBM 360/67. 

2 . Microcomputer Implementation 

D. Becker [8J investigated the performance of imple- 
mentation of the nonadaptive image processing algorithms on 
one 16 bit LSI-11 microcomputer and a combination of this 
LSI-11 microcomputer and a microcomputer compatible CDA-MSP-3 
array processor. It was found that using high order language 
programming and floating point data format, today’s microcom- 
puter implementation is still in its infancy. Its execution 
speed is slow and not anywhere near any real time processing 
requirements. Improvements in microcomputer implementation 
by using assembly language programming, integer data format 
and improved programming on array processor are currently 
being developed. 

3 . Multiple Microcomputer Implementation and this Thesis 

It is obvious that to achieve real time image proces- 
sing performance using microcomputers, several improvements 
should be considered simultaneously. First, the processing 
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capability of individual microcomputers must be improved by 
more imaginative programming and by using attached special 
processors, such as the array processor. Second, and prob- 
ably much more important, is to take advantage of the rapidly 
increasing number of microcomputers affordable in a system 
by cleverly orchestrating them into an effective concurrent 
parallel and pipeline execution of the whole image processing 
program. The advantages offered by the type of multiple micro- 
computer approaches do not stop at faster execution only, but 
also include multi-tasking, higher reliability because of 
better fault tolerance. It was decided that to fully meet 
the needs of new research for the successful development of 
a smart sensor, a second part of this thesis should address 
the implementation issue of image processing algorithms by a 
multiple microcomputer system. Its details will be presented 
in Chapter III. 

D. SCOPE AND EXTENSION OF THIS THESIS 

It should be strongly emphasized that although this thesis 
specifically developed a family of adaptive spatial filters 
for the enhancement of target signal to noise ratio of images 
and a multiple microcomputer system for the implementation 
of the image processing, the motivation of this thesis is 
to contribute to the development of smart sensor systems. 
Therefore, the adaptive filter concepts and design techniques 
are not limited to spatial filters only. They can 
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be readily extended to a wide class of problems of poor 
signal to noise ratios. The implementation issue is not 
limited to adaptive filter processing only. The multiple 
microcomputer system is designed to implement not only the 
mission signal processing but also a host of other signal/ 
data processing tasks for management, command, control and 
communication functions. 
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II. ADAPTIVE IMAGE PROCESSING 



A. INTRODUCTION 
1. General 



The idea of an adaptive filter is inherently attrac- 
tive. It does not take any stretch of imagination to see a 
myriad of advantages offered by an adaptive filter which can 
automatically update itself when it is not performing accord- 
ing to an optimum criterion. The development of adaptive 
filters started in the early 1960’s when it was extended 
from the sampled data control system [9] and when it was 
developed for adaptive antenna applications [10]. In ensuing 
years, a large number of investigations were made for appli- 
cations in antennas [11], noise cancellation [12] and a 
variety of filtering applications [13-48]. 

It is natural that adaptive filter concepts are very 
attractive for the objective of this thesis--to detect very 
dim targets deeply buried in infrared background clutter. 
However, a survey of adaptive filter research published in 
the 70' s reveals the following facts: 

a. Practically all of the past adaptive filter 
research dealt with one dimensional problems. 

b. LMS (least mean square) error has been the most 
widely used criterion. Very little attention has 
been given to other criteria, such as the maximi- 
zation of output signal to noise ratio which is 
probably better suited for detection problems. 
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c. Very little attention has been given to the convergence 
speed issue of adaptive filters. 

Therefore, we decided to address these three issues 
and develop new adaptive image processing techniques which 
are multi-dimensional, using either the mMSE (minimization 
of mean square error) or the MSNR (maximization of signal to 
noise ratio) criterion, and using a family of nonlinear con- 
vergence techniques developed in the optimization field to 
search for the extremum in the adaptive process. 

However, the basic concept of the adaptive filter 
and the traditional IMS approach will be briefly reviewed 
first as a starting point to introduce new techniques devel- 
oped in this thesis. 

2 . Basic Concepts of Adaptive Filters 

The basic concepts of an adaptive filter can be 
described concisely as follows: 

The filter is represented by a vector H. In an 
adaptive filter, H is updated in successive iteration steps 
described by a subscript as A correction term, 

AHp,, is generated in each iteration step such that 



%+l " -K — K 



( 1 . 0 ) 



The iteration steps are carried out to optimize a selected 
performance function until the filter converges to its steady 
state which also corresponds to the reaching of an extremum 
of the performance function surface. 



29 



The filter H could be a temporal filter or a spatial 
filter. It could be a recursive filter, also called infinite 
impulse response CUR) zero/pole filter, or a nonrecur- 

sive filter, also called finite impulse response (FIR), and 
all zero filter. 

The performance function could be the mean square 
error, or the output signal to noise ratio, or other func- 
tions such as the likelihood ratio. The optimization objec- 
tive could be either the minimization or maximization. 

In this thesis, two dimensional spatial filters are 
considered. They are the nonrecursive type. Two types of 
cost functions are used. Their optimization objectives are 
shown in the following table. 



• ‘ TABLE II.O 
OBJECTIVE FUNCTIONS 



Adaptive filter 


Performance Function 


Optimization Goal 


mMSE 


Mean Square Error 


Minimization 


MSNR 


Output Signal to 
Noise Ratio 


Maximization 



Let us consider a nonrecursive spatial filter of a filter 
area of 3 by 3 pixels which has nine filter coefficients. 

The cost function is a surface in a nine dimensional space. 

The goal of the iterative adaptation procedure is to search 
for the coordinates (filter coefficient space) for the extreme 
point (either a minimum or a maximum) of the performance 
function surface. 
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3 . Traditional Approach - LMS Algorithm 

An overwhelmingly large portion of the past adaptive 
filter studies followed the approach originated by Professor 
B. Widrow {14] , and commonly known as the LMS (least mean 
square) algorithm. 

The performance function used in this approach is 
the "mean square error." The optimization goal is "minimiza- 
tion." Prof. Widrow proposed that the adaptation term AH be 
expressed as: 

AH = 2ueX 

where X = signal being processed 

2y = a constant, called adaptive gain 
e = adaptation error = d - H X 
d = reference (or desired signal) 

H = filter coefficient vector. 

The adaptation equation is then 

«K.l = Mk " 2ycX 

A steepest descent search technique is then used for per- 
forming the adaptation steps. 

Although this traditional LMS approach has been used 
by most of the adaptive filter researchers, it is not without 
certain drawbacks which will be briefly described as follows. 

The adaptation equation used in the traditional ap- 
proach can be considered as a special case of a more general 
adaptation equation. 
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«K.l 



Sk * “k^k 



an equation commonly used in the field of optimization. 

The term is sometimes called the "gradient” meaning the 
gradient of the performance function surface. The term 
is sometimes called the "step size" meaning the displacement 
in the vector space H. The optimization procedure at itera- 
tion step K+1 gave a filter vector which is closer to 

the optimal vector H* than previous filter vectors. There- 
fore, Prof. Widrow's imaginative proposal can be interpreted 
as the following two assumptions: 



Gj, 2eX 

Oj, y = a constant. 

These two bold assumptions probably have resulted in several 
inherent limitations. 

a. Because the gradient Gj^ is not tailored to the 
performance function, convergence could be slow. Further, the 
steady state filter result may not yield the best estimation. 
Possibly, a steady state misadj us tment could exist [24]. 

b. Because the step size is assumed to be a 
constant, the adaptation procedure may never reach a steady 
state. 

4 . This Thesis Research 

In view of the results of the survey and review of 
the status of the adaptive filter approach as presented above, 
we identified a series of research problems which must be 
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investigated in order to develop adaptive image processing 
techniques for suppressing background clutter in infrared 
images and for helping the detection of dim targets. 

First, we must extend the one dimensional adaptive 
filter techniques based on the mMSE criterion to two dimen- 
sions . 

Second, we should develop an adaptive filter based 
on the MSNR criterion which is presented in section B. 

Third, we should develop a new adaptive equation 
which is more responsive to the performance function in order 
to improve convergence speed and to minimize steady state 
misadj ustment . In other words, the adaptive equation is in 
the form of 

-K+1 " -K “k-K 

The step size and gradient will not take the form of 
2y and eX as is customarily done in practically all of the 
past adaptive filter studies based on the IMS algorithm. 

Fourth, we will investigate a variety of non-linear 
gradient techniques to search for the minimum in the case 
of mMSE filter and the maximum in the case of MSNR filter. 

They are derived and presented in sections C and D, respec- 
tively. 

The results of applying these adaptive spatial filters 
to two real infrared images will be presented in section F. 
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B. DERIVATION OF OPTIMIZATION CRITERIA 



1 . Performance Function I - mMSE 

The performance function based on the mMSE criteria 
is derived along with the nonrecursive spatial/temporal filter. 
The nonrecursive spatial and temporal filters are described 
by a set of filter coefficients, vector H over the area of a 
"search-box"^. The observed signal in the "ith" "search-box" 
is represented by the signal vector The estimated target 

intensity within the search-box is obtained by the linear 
filter 

Xi (2.00) 

This process is carried out throughout the whole image. 

2 

The nonrecursive filter is represented by the vector 

= [H(D, H(2),..., H(N)] (2.01) 

where N is the number of pixels in the filter "search-box". 

The image signal within the "ith" filter "search-box" is 
described by the vector: 

X.^ = [X.(l), X.(2),..., X.(N)J (2.02) 

Throughout this thesis, matrices will be denoted by a "~" 
under the symbol. Vectors will be denoted by a "_" under 
the symbol. 

The estimation error is defined as: 

^ See Fig. 2.0 
2 

T denotes the transpose of the vector. 
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(2.03) 




S. 



1 



where is the signal and the estimated signal in the 
"search-box" . 



The mMSE (minimization of mean square error) per- 
formance function is defined as: 



J ^ 





(2.04) 



where E[«] denotes the expected value. Substitution of 
(2.00) and (2.03) into (2.04) gives: 

J = E[(H^X. - Sp(H^X. - 

= E[H^X^xJh - 2H^X^S^ + ] 



Since the filter value is fixed for an image, it can be 
moved out of the expectation operation to give: 

J = E[X^xJ ]H - 2.tf-E[X^S^] + E[S^^ ] (2.06) 

In order to simplify (2.06), the following terms are 

defined : 

(1) The autocorrelation matrix of the observed image 

is : 

4 E [X ] (2.07) 

Being a correlation matrix, it is a symmetric and positive 
definite matrix. 

(2) The cross correlation vector between the observed 
signal and the target signal of interest is: 

Rxs = E[X.S.] (2.08) 

(3) The mean square value of the target signal is: 

d ^ E[S/ ] 
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(2.09) 



Substitution o£ ^2.07) through £2.09) into (2.06) gives; 

J = - 2H^R^g + d (2.10) 

Equation (2.10) is the performance function of the mMSE 
criteria. It is a quadratic function in terms of the filter 




Figure 2.0 Search box. 



Theorem 2.01 

The performance function (2.10) is a unimodal (i.e., 
has a single minimum) function if the autocorrelation matrix 
Ryy is positive definite. 

Proof 

The stationary points of the function (2.10) are 
found by setting the gradient of (2.10) with respect to H 
to zero. 



‘ ■ -XS^ ' ° 



( 2 . 11 ) 



Since R^^ is a symmetric positive definite matrix, its in- 
verse exists. Therefore 

-* ^ * -XS 
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( 2 . 12 ). 



Equation (J2..12') is the optimum filter vector which minimizes 
the cost function C2.10). In order to prove that the cost 
function is minimized for H*, the second gradient of (2.10) 
with respect to H is taken. 

= ?XX ^2.13) 



Since 

mized. 



is positive definite, 



The minimum value is 



the cost function is mini- 



min 



d 






(2.14) 



It is obtained by substituting the optimum filter vector 
H* to (2.10). 



The second derivative of the cost function I, as 
described in (2.13), is called the Hessian matrix. 

If the autocorrelation matrix is singular, the cost 
function (2.10) is no longer unimodal because (2.11) can be 
set to zero for an infinite number of filter vectors H. 

It can be shown [49] that for such a case, a minimal 
solution can be obtained [50, 51] by using the pseudo inverse 
of 



-* '^5xx ?xx^ 



-1 



~xx * -xs 



The solution is not unique. 
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2. Performance Function II - MSNR 



The observed signal in the ''search-box" is repre- 
sented by the vector X. Let us assume that the target signal 
vector S and the clutter noise vector N are additive: 

X = S + N (2.15) 



Applying the linear filter H to the input signal vector X, 
we obtain: 



h'^X = h”^(S + N) 

T T 

= H S + H^N 



(2.16) 



Let us define the following terms: 

~ ~ target signal after filtering (2.17) 

Nq - N = clutter noise after filtering (2.18) 

The output signal to clutter noise ratio is then defined as: 



T 

The Power in the filter mage H X due to target signal 

j ^ (2.19) 

T 

The Power in the filter mage H X due to clutter noise 

E[So^l 

J = (2.20) 

E[N,'] 

Where £[•] denotes the expected value, substitution of 
(2.17) and (2.18) into (2.20) gives: 

2 

EIGHTS) J eih'^ss^h] 

J = — ^ (2.21) 

E [ CH^N) J E 



The filter vector H can be taken out of the expectation 
operation. 
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Let us define the signal autocorrelation matrix as: 



^ E[SS^J (2.23) 

uO 

and the clutter noise autocorrelation matrix as: 

Rnn ■ 



-?NN 



and Rgg are symmetric and positive definite. Substitution 



of (2.23) and (2.24) in (2.22) yields: 



J = 



H^Rc;c;H 

H^RmxtH 
— 'X/NN— 



(2.25) 



The performance function J in (2.25) is the performance 
function of the MSNR criteria. 

The filter vector H is obtained by maximizing J in 
(2.25) with respect to the filter vector H. 



Theorem 2.02 

The maximum of the objective function (2.25) is 
equal to the largest eigenvalue of the matrix • ^SS’ 
the optimum filter H* is the corresponding eigenvector. 

Proof 

The proof is based on the Cauchy- Schwarz inequality 

by finding the upper bound of J. 

Since the autocorrelation matrix Rj^jj is symmetric 

and positive definite, there exists a square nonsingular 

matrix V which satisfies the relation [52]. 
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Substitution of (2.26) into C2-25) and using the fact that 

(2.27) 

.-1 -1 



"1 rp rp ” i 

V. V = V «V = J 

% ^ % % 



gives 



J = 



h’'cv'^^)h 



( 2 . 28 ) 



Let us define the normalized vector W as: 



A VH 
W = ^ 



VH 



‘^(VH)’^. (VH) 



II VH II 

— 



which also satisfies the normalization condition, 
W^W = 1 



(2.29) 



(2.30) 



Substitution of (2.29) and (2.30) into (2.28) gives 

y-1 -1 

J = RccV W 

— -v '\^SS'\j — 



(2.31) 



Let us define the matrix P as 

a, 

A -1 

P ^ V^ R^cV 

a, 'ViSS'V- 



(2.32) 



Equation (2.31) becomes 



J = W'P W 

— a, — 



Using the Schwarz inequality, we obtain 



(2.33) 



(w^pw) _<cw'^w) . (W'^P^PW) 



(2.34) 



Since the left side of the inequality is equal to , the 

2 

right side of (2.34) is the upper bound of J . The performance 
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function J reaches its maximum when the equality holds, 
which occurs when; 

W = a • PW ■ C2.3S) 

where a is a constant* Substituting. (2 . 29) and (2.32) into 
(2.35) obtains (2.36). 

VH T.-1 1 VH (2.36) 

• Y ?ssY • , ~ 

■CVH)T (-yH) CVH)T (VH) 

Multiplying (2.36) by 

— — , (2.37) 

’^(VH)T (VH) 

we obtain: 

y'^V • H = a • vV • RVj^VH (2.38) 

Substituting (2.26) and (2.27) in (2.38), we get: 

= a • R 35 • H (2.39) 

Since R.^j^ is a positive definite matrix, its inverse 
exists. Multiplying (2.39) by ^ • Rj^^ , we obtain: 

• ?SS - S • P • H* = 0 (2.40) 

where I is the identity matrix. 

Equation C2.40) is called the generalized eigenvalue 
eigenvector problem [52]. 

Substituting the H* of (2.40) into (2.25), we obtain 
the maximum value of J. One can see that 
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■^max ^ a ^max 



(2.41) 



In other words, is the largest eigenvalue of the matrix 

^NN ^SS’ corresponding eigenvector. The noise 

correlation matrix can be obtained by assuming some target 
signal of interest S and using the observed signal X in the 
following way (the signal and noise are assumed additive) . 



?NN " E[(X- S)(X- S) ] 



(Q.E.D.) 



Theorem 2.03 



The performance function J in (2.25) is in general a 
multi-modal function. 

Proof 

Based on theorem (2.02), the stationary points of the 

performance function J satisfy the eigenvector equation (2.40) 

• % 

~SS 'a * I) M* = 0 

In general, this equation has n different solutions, because 
the matrix R^^ in general can have n distinct eigenvalues, 
and thus n corresponding eigenvectors. So, in general, the 
performance function can have one absolute maximum and n-1 
local smaller maxima. 

Theorem 2.04 

The performance function J is a unimodal (has a single 
maximum) if the matrices Rgg and Rj^j^ are defined as in (2.23) 
and (2.24). 

Proof 

The proof is based on the fact that R^g is a dyad. 

Use equation (2.40): 
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^SS ■ a * • H = 0 



The matrix Rgg* being a dyad, can be written as: 

T 



Rcc = r • r 
'>jSS — — 



(2.42) 

where r is a vector. 

As mentioned before for the nontrivial solution of 
(2.40), the performance function 

(2.43) 

Using (2.42) and (2.43) in (2.40), we obtain 



J = - 
a 



r' J . r r ^ • H* = JH* 
'X/NN — — 



(2.44) 



‘Separating (2.44) into a product of a vector and a constant, 
we obtain: 

_ 1 T» 

(2.45) 



cCx)(i’’h*) = j - h* 



For generality, a constant g can be used in the left side 
of (2.45) to give: 



(s-RnJ = J-H* 



Comparing both sides of (2.46), we get 



T 1 T„* 

J„ax - 6 • i 



H* = 6 • R'i • r 
— 'X/NN — 



(2.46) 



(2.47) 



(2.48) 



Equation (2.47) shows that if Rg^ is a dyad, the 
performance function J has a unique stationary point 
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where it reaches its maximum. The general eigenvalue problem 



has a single non zero eigenvalue a corresponding 



C. DERIVATION OF SEARCHING TECHNIQUES FOR EXTREMUM: 

GRADIENT SEARCH METHODS FOR THE MINIMUM OF THE mMSE 
PERFORMANCE FUNCTIONS 

1 . Steepest Descent Method (SD) and 
the Best Step Adaptation Gain 

The steepest descent method is a gradient method 
which uses the Jacobian gradient (G = V^^J) of the performance 
function J to determine a suitable direction of search. Grad- 
ient methods which use the Jacobian to determine the direction 
search are called first order methods. Gradient methods for 
optimization are based on the Taylor expansion of the per- 
formance function J, as given below: 



where G is the Jacobian gradient of J and A is the matrix of 
second order partial derivatives called the Hessian matrix. 
Equation (2.49) can be written in the form: 




(Q.E.D.) 



J (H + AH) = J (H) + G^ • AH + yAH^A . AH 



(2.49) 



J (H + AH) = J (H) + A J 



(2.50) 



The steepest descent uses only the Jacobian, so 



AJ = G^-aH 



(2.51) 



In order to minimize the performance function J, we want to 

generate a descending sequence of J which finally converges 

44 



to the minimum of J, J*. In other words, we want a negative 
AJ, but: 

AJ = II G II . II aH II . cos $ 

where (p is the angle between the two vectors, G and aH. 

For maximum reduction of the cost function J, 0 = it (2.52) 
From (2.52), it is obvious that the change AH in 
the filter vector H should be in the direction of the nega- 
tive gradient - £. This direction is called the steepest 
descent direction. 

The steepest descent step aH can be written in the 

form: 

AH = - a-G (2.53) 

where -G is called the step direction gradient and a the 
step size. In adaptive filter terminology, a is called the 
adaptation gain. 

In order to generate an iterative method, one can 
represent the filter vector H + aH as H 

Thus , 

^K + 1 " “k (2.54) 

Substituting C2.53) in (2.54), we obtain: 

-K+1 " -K ’ * -K (2.55) 

Equation C2.55) is called the steepest descent iterative 
method. For simplicity and without losing generality, the 
negative sign will be included in a . Thus (2.55) becomes: 
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-K+1 ^ -K “k -K 



(2.56) 



If very small values of are selected, the sequence {Hj,} 

will converge very slowly. In order to increase the speed 
of convergence substantially, we chose the step sizes which 
provide the biggest descent each step. This concept is 
called the ''best step". The adaptation gain is picked to 
minimize This choice of constitutes a one dimen- 

sional minimization of the performance function J(Hj^^^). 

Lemma 2.05 

Let J (Hj^) be the performance function to be minimized. 
Let the filter vector be updated by the steepest descent 

method (2.56), then the "best step" towards the minimum of 
J is obtained in every iteration if the adaptation gain 
satisfies the relationship; 



where Gj^ is the Jacobian gradient of J with respect to the 
filter vector H. 

Proof 



The task is to find which minimizes J(Hj^^j^) by setting 




(2.57) 



The performance function J(Hj,^j^) can be expressed as: 



= J(Hj. + Gj,) 



(2.58) 




46 






0 



(2.59) 



a 



K 



but is a function of as shown in (2.56). 

(2.59) becomes: 



Thus 






= 0 



(2.60) 



Since = V„ J and K+1^ = G,., (2.60) becomes: 

^ ^ -K+1 ““k ^ 



-K+1 -K ° 



(2.61) 



From (2.61), the best step concept requires orthogonality 
between the two gradient vectors, and Gjr. 

(Q.E.D.) 

Up to this point, the cost function J was not speci- 
fied, and the derivation of the steepest descent was made 
for any continuous differentiable function. 

The mMSE performance function as given by [2.10] 
can be written as: 

Jk = Hk fexSK - 2 Hk Rxs * i C2-62) 

The gradient G^^ of with respect to Hj. is given by; 

-K " " -XS^ (2.63) 

From (2.63) and (2.56), G^^^^ expressed as: 
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-K+1 ~ %+l ■ -XS^ 



(2.64) 



- 2(R^^(Hj^ + Gj^) - R^g) 

" 2<^?XX -K " -XS^ ^ * ?XX ‘^K * -K 
" ^K ’ ?XX • ^K 



Lemma (2.05) is used in (2.64) to compute the best 



step a 



K’ 



Since Gj^ = 0, Lemma (2.05) 



(Gj^ + 2a^ • Ryj^ Gj^) • Gj^ = 0, see (2.64) 



-K -K * -K ?XX-K " ° 



(2.65) 



R^^ is a symmetric matrix, thus R^'^^ = R^^^* Using 
this fact in (2.65), we obtain 



1 G./ 

7 * ^ T 



-K fe-K 



( 2 . 66 ) 



Equation (2.66) is the equation of the best step for the 
steepest descent method. 

Combining the results from (2.56), (2.63) and (2.66), 
we obtain the steepest descent adaptive filter: 

[Step 1] Set a starting filter vector Hq, stopping bound (i.e., 
max. acceptable adaptation error) e, the correlation matrix 
R^X observed signal, and the cross correlation 

vector Rvo- 
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[Step 2] 



Compute the gradient: 
-K ■ -XS^ 



[Step 3] Compute the adaptation gain: 



“K " ■ 2 • 



°K Sk 

Sk %x £k 



[Step 4] Update the filter vector: 



-K+1 " -K * -K 



[Step 5] Test for stopping condition: 

If gJ" Gj, < e, then terminate. Otherwise, 
go to step 2. 

T 

The stopping criteria is chosen as G <. £ because 
the performance function is unimodal (has a single stationary 
point) , and we are looking for the stationary point which in 
fact satisfies the vanishing of the gradient. 

2 . Accelerated Steepest Descent Method (ASP) 

The accelerated steepest descent method was first 
introduced in 1964 by Shah, Buehler and Kempthose [ 53 ] 

Its purpose was to accelerate the convergence of the standard 
steepest descent method. Its concept was incorporated in an 
algorithm which converges to the minimum of any n dimensional 
quadratic function in no more than 2*n-l steps. Practically, 
this algorithm is not very efficient because of its sensi- 
tivity to error propagation. For large n, the error propa- 
gation affected the convergence rate and the method sometimes 

4 9 



converges as slowly or even more slowly than the steepest 
descent method. 



The adaptation gain of the ASD method is computed 
using Lemma 2.05 and the fact that the adaptive filter 
is updated by the iterative equation: 



-K+1 " % “‘k * -K (2.67) 

From Lemma (2.05) and (2.67), 

° ( 2 . 68 ) 

-K+1 %+l “ -XS^ (2.63)] 



°‘k -K^ ■ -XS^ 

= iix ■ -xs^ ^“K fe-K 



" -K te -K 



(2.69) 



Using (2.69) and (2.68), we obtain: 






K a:xx -K" --K 



(2.70) 






I • IlIl 



V %X 



(2.71) 



The accelerated steepest descent adaptive filter is 
carried out by the following steps: 
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[Step IJ Set a starting filter vector, Hq = H^, stopping 



[Step 3] 



-XS' 



-K " -K ' -XS^ 



V ( -K 

i H„ - H„ , for 



-K -K-2 

[Step 4] Compute the adaptation gain a 

. 1 C T 

■ 2 ‘ 7TT 



rix 


?XX 


and 


the cros 


the 


gradient 


—O' 


:k 


J. 






tion 


vec 


tor 


y-v 


for 


K = 


2, 


4 , 6 ... 


for 


K = 


3, 


5 , 7 ... 



K‘ 



Exx 



[Step 5] Update the filter vector 

Hk+1 “ Ak * “k Ak 

[Step 6] Test for stopping condition. 

T 

Otherwise go to 

step 2. 



3 . Amir's Method (AMM) 

This method was suggested by this author at the 
beginning of the research. The purpose was to derive a method 
which will converge faster than the steepest descent method. 
Experiments showed that the AMM method converges approximately 
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three times faster than the SD method as shown in Fig. 2.6a. 
This method is a non-conjugate gradient method and is not as 
fast as the conjugate gradient methods. But it can replace 
the SD method as a robust and faster method. 

The AMM gradient search method was designed based 
on the fact that the gradient of a unimodal performance func- 
tion vanishes only once, at the stationary point of the per- 
formance function, which is the extremum point we are looking 
for . 

The adaptation procedure is derived in the following. 

The functional "Fj, is defined as: 

'^K " -K -K (2.71-1) 



where is the gradient of the performance function J, as 
in (2.63). 

The adaptive filter is updated as given in 

(2.56) for the SD method. The adaptation gain is computed 
according to the "best step" concept, to minimize 
Using (2.64), we obtain: 






K+1 



-K+1 -K+1 



(2.71-2) 



(Gr + Uj. Gj^) • CGj, + Uj, Gj^) 

= £k “k “k ^XX^-K 






K+1 



-V ^ ^“k 5xx “k ?XX^-K 



(2.71-3) 
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In order to get the best step, we take the derivative of 



*^K+1 respect to the adaptation gain and set it to 

zero : 



d v 



-K ^-fxx ^“K^XX^-K 



^-K 5xX -K -/SxX -K 



= 0 



(2.71-4) 



Solve (.2.71-4) and get: 



a 



K 



^XX -K 



(2.71-5) 



The AMM adaptive filter is implemented by the 
following steps: 



[Step 1] Set initial filter vector H , the stopping 

—“O 

bound e, the correlation matrix and the cross - 

correlation vector Rj^g* 



[Step 2] Compute the gradient Gj^ of the performance function 

J. 



-K ~ ^ *^^XX -K ■ -XS^ 



[Step 3] Compute the adaptation gain a. 



r 



“K " 



-K SxX -K 
-K 5xX-K 



[Step 4] Update the filter vector 



-K+1 ^ -K “K -K 
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[Step 5] Test for stopping condition. 

If < e, then terminate, otherwise go to step 2. 
j\ 

4 . Fletcher-Reeve Conjugate Gradient Method (CGF) 

The Fletcher-Reeves conjugate gradient (CGF) was 
first introduced in 1964 by Fletcher and Reeves [69]. The 
method is similar to the pioneering work of Hestenes and Stiefel 
[54] . The CGF method uses conjugate vectors as step direc- 
tion. 



Definition 

The vectors Vj are said to be ’'conjugate" with 

respect to the matrix R^^ if they satisfy the following 
condition: 






0 for i j and V^, f 0. 



The importance of this method is its fast convergence rate 
for quadratic functions like (Eq. 2.10). This method is 
proved to converge in n steps apart from rounding errors 
where n is the dimension of the filter vector. 

The adaptation gain of the CGF method is computed 
using Lemma 2.05 and the fact that the adaptive filter 
is updated by the iterative equation in (2.67). Following 
the equations (2.67) up to (2.71) in a similar way, we 
obtain the adaptation gain as: 



“k 




Vk Exx 



(2.71-6) 
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The step direction vector Vj, is computed by the following 
iterative procedure [55] . 



function [54], The performance result was poor. Subsequently, 
it was suggested to restart the method every n iterations, 
where n is the dimension of the vector H. This thesis con- 
firmed that the convergence of this method for our two per- 
formance functions (2.10) and (2.25) is faster if this method 
of restarts is used. 



ing steps: 

[Step 1] Select a starting filter vector Hq, the stopping 
bound e, the auto-correlation matrix and the cross 

correlation vector 

[Step 2] Compute the gradient G„ of the performance func- 





(2.73) 




(2.74) 



The. method of CGF was once applied to the Rosenbrock 



The CGF adaptive filter is carried out in the follow- 



tion J. 




[Step 3] Compute the step direction vector V 




-K+1 -K-1 



else , 
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[Step 4] Compute the adaptation gain: 



' 7 



G V 
-K -K 

Xk ?XX^K 



[Step 5J Update the filter vector 



-K+1 ~ -K “^K-K 



[Step 6] Test for stopping condition. 

T 

If Gi, £ e> terminate the adaptation. Otherwise go to 
step 2. 



5. Pollack- Rebiere Conjugate Gradient Method (CGP) 

The Pollack- Rebiere conjugate gradient CCGP method 
is similar to the CGF method. The difference is in the com- 
putation of the search direction when K Mod n 0. In [56] , 
Powell gave a theoretical reason for favoring the Pollack-Rebiere 
algorithm. In this thesis, the author found the CGP method 
more efficient and converging faster than the CGF method. (See 
Section F) . 

The search direction of the CGP method is given by 
the following expression: 



Ik = 



* 



£k (£k - £k-P 

-K-l -K-1 



Xk-1 



(2.75) 



The CGP adaptive filter is carried out in the following steps: 



[Step 1] Select a starting filter vector 

bound £, the auto-correlation matrix R^j^ 
correlation vector R^^^* 



the stopping 
and the cross- 
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[Step 2] Compute the gradient of the performance function 

J. 



-K ■ -K ■ -XS^ 



[Step 3J Compute the step direction vector Vj^. 






- £k 



- £k * 



if K Mod n = 0 



-K ^-K ' -K-1^ 
-K-1 * -K-1 



V 

-K-1 else 



[Step 4) Compute the adaptation gain. 



«K ■ 2 



1 



-K ?XX -K 



[Step 5] Update the filter vector 



-K+1 -K * -K 



[Step 6] Test for stopping condition. 



— K — K — terminate the adaptation. Otherwise go 
to step 2. 



6 . ‘Davidon -Fletcher-Powell Variable Metric Method CDFP) 
One of the most efficient searching methods is the 
David on - Fletcher-Powell CDFP). It was developed by Fletcher 
and Powell [ 57 ] from the variable metric method due to 
Davidon [54,58], The variable metric term was coined by 
Davidon to describe methods which at the K iteration utilize 
the increment of the form 
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AH 



-K 



(2.76) 



and update the metric- correction transformation Aj^ from 
iteration to iteration. The DFP method updates the metric 
Aj^ by the iterative expression: 



where ; 



" “K 



T 

V P 
-K -K 



A P P ^ A 
T 

P A P 
-K ~K -K 






(2.77) 



(2.78) 



-K -K+1 ■ -K 



(2.79) 



Fletcher and Powell proved that for a general 
function J that a positive definite Aj, implies Aj,^^ is also 
positive definite [ 58 ] . For the performance function J 
given in (2.10), it can be shown [ 59 ] that the set 
(uk ^ conjugate directions so the DFP 

exhibits quadratic termination in n steps. 

The adaptation gain of the DFP adaptive filter based 
on the best step concept introduced in (Lemma 2.05). 

Using the filter update of the DFP method; 



Sk+1 



Hk " 



“K 



( 2 . 80 ) 



The adaptation gain is found to be; 



a 



1 v/ 



K 



-K fxX-K 
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The adaptive filter designed by the DFP method is 

carried out in the following steps; 

[Step Ij Select a starting filter vector the starting 

correction metric = I (where I is the identity matrix, 
the gradient G^, the stopping bound e, the autocorrelation 
matrix and the crosscorrelation vector R^g* 

[Step 2] Compute the step direction vector 
-K " ■ -K 

[Step 3] Compute the adaptation gain. 



a 



1 2k 



K 






[Step 4] Update the filter vector Hj^. 



Hk.i = «k ^ “k 



[Step 5] Compute the gradient ^he function J 

-K+1 " -K+1 ■ -XS^ 



[Step 6] Compute the vector Pj^. 



-K " -K+1 ■ -K 



[Step 7J Update the variable metric 



^K+l ' y T - P T 

-K i-K -K CK-K 
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[Step 8] Test for stopping criteria. 



If G„ T • , <. e > terminate the adaptation, otherwise 

— 1 — K +1 

go to step 2. 



D. DERIVATION OF SEARCHING TECHNIQUES FOR EXTREMUM, 

GRADIENT SEARCH METHODS FOR THE MAXIMUM OF THE 
MSNR PERFORMANCE FUNCTION 

1 . Approximation for Best Step Adaptation Gain 

The maximization of signal to noise ratio performance 
function J, as defined in (2.25), is a non-linear performance 
function of the filter vector H. The function J being non- 
quadratic introduces new difficulties. The methods which 
have been theoretically proved to converge in N steps for 
quadratic cases like the mMSE, no longer converge as fast. 

The adaptation gain can no longer be efficiently computed 
by the best step concept because of the large amount of 
computation required to obtain the best step. In order to 
make tnis gradient search method efficient, the adaptation 
gain is approximated by the "best step" concept to generate 
a nondecreasing sequence of performance functions {J,^} which 
finally converges to the maximum of J. 



Lemma 2.06 

Let the performance function J be defined as in 
(2.25), and the adaptive filter be updated according to 
C2.67), then the best step adaptation gain at iteration step 
K satisfies the relation: 
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Proof 



The proof is based on the Lemma . 2.05. Using (2.67) and 
Lemma 2.05, we obtain: 

^K+1*^K " ° i2.82) 

where is the gradient of the performance function J 

at the K+1 iteration step, and Vj, is the step direction 
(search direction) vector. But according to (2.25), 

T 

. _ -K+1?SS -K+1 

H R H 
-K+1. i)NN -K+1 

The gradient of with respect to is: 



,7T 



^■'*1 . HkVi JnnHk.i 



■ '^K+1?NN^ * -K+1 



(2.83) 



Using (2.83) in (2.82), we obtain: 



T 



-K+1?NN-K+1 



%+l * ’ *^K+1 ?NN^ * -K (2.84) 



But according to (2.67) 



and Rg2> symmetric and positive definite gives: 



*^-K “^K-K ^ ^?SS ' ‘^K+1 * ‘ -K ° 



(2.85) 



So , 



-K ^~SS ■ ‘^K+1 ’ ?NN^-K “k ’ -K *'?SS ' '^K+1?NN^-K ^ ° 



“k = 



-K ^~SS ' •^K+l SnN^ -K 
-K ^?SS ' ‘^K+l ?NN^ -K 



( 2 . 86 ) 



Q . E . D . 

The adaptation gain in (2.86) cannot be obtained 
because it is a function of which itself cannot be com- 
puted without Thus the "best step" concept introduces 

a nonlinear problem for the MSNR performance function. In 
order to overcome this problem of solving a nonlinear equation 
in each iteration, the adaptation gain will be approximated 
by using instead of Since is obtained one step 

prior to does not need to be solved. Now we must 

prove that this choice of adaptation gain for the MSNR per- 
formance function will generate a nondecreasing sequence 
{Jj^} which eventually converges to the optimum 

Lemma 2.07 



max 



Let the performance function J be defined as in (2.25) 
and the filter Hj, be updated by (2.67), let the adaptation 
gain aj^ be given by 






-K ^?SS ' ?NN^ -K 

-K ^?SS ' ?NN^ -K 



(2.87) 
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Then it will generate a nondecreasing sequence which 

converge to the maximum J. 



Proof 



Using (2.253, we obtain; 

T 
n 

J 



_ -K-H ~SS -K+l 



K+1 



%+l ~NN -K+1 



( 2.883 



Substitution of (2.673 in (2.883 gives: 



'K+1 



^ Mk ^K -K^ ^K -K^ 

*• -K “k -K^ ?NN^-K “k -K^ 



(2.893 



, ^ o -K ?SS -K 2 -K ~SS -K 

^ 2“k • m — T" "^K rf 



= J 



K 



-K ~SS -K 



% ~SS -K 



T ^ o -K ?NN -K ^ 2 -K ?NN -K 

^ ^ 2“k * Trr: — :r °^k rr 



-K mNN -K 



-K ~NN -K 



Equation (2.893 is simplified due to the fact that Rgg, Rj^j^ 
are symmetric and positive definite. 

In order to obtain a non-decreasing sequence we 

must satisfy: 2. ^nt the sequence is positive, 

so : 



J 



K+1 



'Jk - 



> 1 



(2.903 



In order for (2.803 to satisfy (2.903 , ii^ can be seen 



that : 
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, -K ~SS -K 2 -K ~SS ^ ^ ^ 

1 + — T T i 1 “*■ 2aj^ 



-K ~SS -K 



-K ~SS -K 



-K ~NN -K 
-K ~NN -K 



-I* a 



2 -K ~NN -K 

K H R H 
-K -NN -K 



(2.91) 



Using (2.91), (2.25) and the fact that Rgg are positive 

definite matrices, we obtain: 



^*-K ~SS -K * -K ?SS -K - -K ?NN -K “k‘-K ?NN -K^ 



(2.92) 



“k‘-K •^-SS ■ '^K ?NN^-K - 



^*-K *^?SS ■ ~NN^-K 



a 




-2 



-K *^~SS ' -NN^-K 

*^?SS ' -NN^^K 



(2.93) 



So the adaptation gain given in (2.87) generates a non- 
decreasing sequence because it satisfies (2.93). 

Q.E.D. 

Lemma 2.08 

Let the performance function J be defined as in (2.25) 
and the filter vector being updated by (2.67), then for 
each iteration step K, the gradient Gj^ of J is orthogonal 
to the filter vector Hj. regardless of the adaptation gain. 
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Proof 



The performance function given by (2.25) is 

-K ?SS -K 

^ " H^R H 
-K ~NN -K 

The gradient of with respect to the filter vector is 



-K 



given by 



£k - - :rr^ 



X iiK ?NN Sk 



From (2.83), it follows that: 



■ '^K * ?NN^ ■ -K 



(2.83) 



% * -K ~ „ T 



' • k/ «ss - Jk ?NN^ • «K 



-K ?NN -K 



»K Sk - 2 



Sk Sss »k 

Sk ?nn Sk 



. Hk ?nn 5k 1 

— T 

Sk ?nn Mk 



Using (2.25) in (2.95), we obtain: 






(2.95) 



Therefore, Hj7 ^ 



(2.96) 



Thus the filter vector at iteration step K is ortho- 
gonal to the gradient Gj^ of the performance function J. 

Q.E.D. 

2 . Steepest Descent Method (SD) 

The steepest descent (SD) method as described for 
the quadratic mMSE perform function can be used here for the 
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MSNR performance function with some exceptions: 

The concept of the "best step" is used by an approx- 
imation of the best adaptation gain. The gradient of the 
performance function with respect to the filter vector is a 
function of the performance function. Thus, successive 
values of performance function must be computed. 



The adaptation equation used here is identical to 

(2.56) . 



The adaptation gain is obtained from (2.87) by replacing 
the step direction vector Vj^ with the gradient (the 
direction of the SD) . 

The adaptation gain obtained is: 



-K+1 ^ -K “‘K ’ -K 




(2.97) 



The matrix Qj^ is defined as: 




(2.98) 



Substituting (2.98) into (2.99), we obtain: 




(2.99) 



The adaptive filter designed by the SD method is carried out 
in the following steps: 



[Step IJ Select a starting filter vector and a stopping 
bound (S . 

[Step 2] Compute the performance function Jj. as in C2.25) 



r _ -K fss -K 

K H^d h 
-K i;NN -K 



[Step 3] Compute the gradient Gj, = 

~K 






Hk ?nn Sk 



9k • 3k 



where Qj^ is given by (2.98). 

[Step 4] Compute the adaptation gain: 



a 



K 



«K 9k 

T* 



-K Sk -K 

[Step 5] Update the filter vector Hj, 



«K+l = “k '‘K 



[Step 6] Test for stopping condition. 

If "I I < 5 , then terminate the adaptation. 
II“K I' ~~ 

Otherwise, go to step 2. 



The stopping condition is different from the one used 
for the mMSE criteria because in this case the gradient Gj^ 
is a nonlinear function of the filter and when “ the 

gradient G £ C^se C2.83) to verify). Thus, the gradient 
does not necessarily vanish at the stationary point, but can 
vanish when the system diverges. 
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3 . Accelerated Steepest Descent Method (ASP) 

The ASD method derived in (II. C. 2) is applied in 
this section with some modifications to design an adaptive 
filter which maximizes the performance function J in (2.25). 
The adaptive filter is updated according to (2.67). 



Sk+1 



Hk * “k • 



The step direction vector Vj, is computed from the filter 
vector Hj, and the gradient of the performance function 



J 



K* 



-Gj^ forK=2,4,6... 

' ^K-2 ^ ~ ^ * * * 



The gradient Gj^ is obtained from (2.83) and the adaptation 
gain from (2.87) and Lemma 2.07, 

The adaptive filter designed by the ASD method is 
carried out in the following steps: 



[Step 1] Set a starting filter vector ~ Ml » stopping 

bound 6 and compute the performance function and the 
gradient G^. 

[Step 2] Compute the performance function value at itera- 
tion step K. 



K 



-K ^SS -K 

Hk ?nn Hr 
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[Step 3J Compute the gradient Gj^ of Jj^ with respect to 






?NN -K 






«K 



where is given by (2.98) 

[Step 4] Compute the step direction vector Vj^. 



! - Gj, for K = 2 , 4 , 6 . . . 

— K ' — K- 2 K = 3 , 5 , 7 . . . 



[Step 5] Compute the adaptation gain: 



[Step 6] 



a 



K 



Update 



. Hr 8k °K 
-K §K -K 

the filter vector: 



«K.l 



- r . * “k 




[Step 7] Test for stopping condition: 

If J1 ' ~^ll < 5 then terminate the 

II £1k II “ 

otherwise go to step 2. 



adaption , 



4 . Fletcher-Reeves Conjugate Gradient Method (CGF) 

The Fletcher-Reeves conjugate gradient (CGF) method 
is applied to the MSNR adaptive filter in a similar way as 
for the mMSE adaptive filter. However, the nonlinear MSNR 
performance function requires more computation and does not 
use the true "best step" but an approximation. The "restart" 
concept was used and found to be able to accelerate the con- 
vergence speed. 
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The adaptive filter based on the CGF method is 
updated by the following iterative scheme: 

Mk+1 ” -K -K (2.100) 

The step direction vector Vj, is obtained as in (II. C. 4) by 
the expression: 




Sk 



5k * 



if K Mod n = 0 



5k 5k 
5k-i5k-i 



5k-i 



( 2 . 101 ) 



The adaptation gain is obtained from Lemma (2.07) and 
given by: 



^ ^ _ -K ~ ’^K * -K 

• ?nn) Ik 



( 2 . 102 ) 



Using definition (2.90), we obtain: 



a 



K 



Sk Sk ^k 

5/ ?K ^K 



(2.103) 



The adaptive filter designed by the CGF method is 
carried out in the following steps; 

[Step IJ Select a starting filter vector H and a stopping 

“~0 

bound 6 . 

[Step 2] Compute the performance function Jj^. 
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J 



K 



-K ?SS -K 
-K ?NN -K 



[Step 3] Compute the gradient Gj^ of Jj^ with respect to 



-K " 



-K ?NN -K 



9k * 



where Qj, is given by (2.98). 



[Step 4] Compute the step direction vector V^,. 



XK=j 



£k 



- * 






^K- A-1 



if K Mod n = 0. 
• V^-i Else. 



[Step 5] Compute the adaptation gain, 



a 



K 



Hk 9k Ik 
9k Vk 



[Step 6] Update the filter vector H 



-K‘ 



-K+1 ^ -K -K 



[Step 7] Test for stopping condition. 

II Mj(+i ” Mk II 

If II — p — — U- £ 6 then terminate the adaptation. 
Otherwise go to step 2. 

5 . Ibllack-Rebiere Conjugate Gradient Method (CGP) 

The CGP method is similar to the CGF method. The 
only difference is the way the step direction is computed. 
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The CGP method uses the following expression to 



compute the step direction vector Vj,. 



Vk={ 



5k 



Sk " 



if K Mod n = 0 
-K ^^K’-K-l^ ,, 

pT * ^K-1 

-K- l^K- 1 



All the rest is identical to the CGF method. However, this 
method was found to converge much faster than the CGF for 
all the images tested in this thesis. 

The adaptive filter designed by the CGP method is 
carried out in the following steps: 



[Step 1] Select a starting filter vector ^ and a stopping 



bound 6 . 

[Step 2] Compute the performance function J 



K’ 






-K ~SS -K 
-K ?NN -K 



[Step 5J Compute the gradient G^^ of with regard to H 



K 



-r 



= 



-K ?NN -K 



SkMk 



where is given by (2.98) 

[Step 4J Compute the step direction vector 

— K 

^ - Gj, if K Mod n = 0 

-"M _ SkC5k-£k-i) 

■ 5k " -T ^ ^K-l 



Else , 
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[Step 5J Compute the adaptation gain. 




[Step 6J Update the filter vector Hj.. 



-K+1 -K ‘^K -K 



[Step 7] Test for stopping condition: 




H 



- H 



<_ (5 then terminate the adaptation, 



otherwise go to step 2. 

6 . Davidon-Fletcher-Powell Variable Metric Method (DFP) 
The DFP method is applied to the MSNR adaptive filter 
in a similar way as for the mMSE adaptive filter. The major 
difference is the approximation of the adaptation gain and 
the need to evaluate the perfornfance function at every iter- 
ation step X. 



The adaptive filter based on the DFP method is updated 



The step direction vector Vj, is obtained by the variable 
metric : 



by the following iterative scheme: 



-K+1 " -K “k * -K 





(2.105) 



The adaptation gain is obtained from Lemma (2.07): 
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where Qj^ is given by (2.98). The metric is updated by 
the DFP iterative procedure: 




A P P A 
~K -K -K ~K 

T 

P A P 

•‘•V' V 



-K CK -K 



(2.107) 



The vector Pj^ in (2.107) is defined as: 





(2.108) 



The adaptive filter designed by the DFP method is 



carried out in the following steps: 

[Step 1] Select a starting filter vector the starting 

correction metric A^ = I (where I is the identity matrix) , 
compute the gradient of the performance function as 
before, set the stopping bound 6. 

[Step 2] Compute the step direction vector 

-K " ■ ■^K -K 

[Step 3] Compute the adaptation gain 




[Step 4J Update the filter vector Hj.. 



[Step 5] Compute the performance function 



'K+1 



-K+l-SS -K-t-1 
-K+l-NN %+l 



[Step 6J Compute the gradient of the performance 

function with respect to the filter vector 



^K+1 



ij T p H 
-K+l-NN -K+1 



9k*i • Hk*i 



where given by updating (2.98). 

[Step 7] Compute the vector by (2.108). 



[Step 8] 



-K -K+1 ■ -K 



Update the correction matrix by (2.107) 

T 



^KH ' -K * “K 



T 

V P 
-K -K 



T 

A P P ^ A 
-K -K ~K 

T 

P A P 
-K -K 



(2.107) 



[Step 9] Test for stopping condition. 

This step can be done after step 4 to save some extra 
computations but was placed here to follow the consistent 
pattern as all other methods. 

II —K+1 " — K II 

If 4J — LL £ 5 then terminate the adaptation pro- 
cedure, otherwise go to step 2. 
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7 . Amir's Transform Method (AT) 

From test results, which will be shown in (2.17 — 
2.28), it was observed that a faster convergence method will 
be helpful for designing the MSNR adaptive filter. Both the 
conjugate gradient method by Pollack and the variable metric 
method following Davidon do not exhibit the same convergence 
speed as for the quadratic mMSE case. The reason for this 
slower convergence for the MSNR performance function is the 
nonlinear nonquadratic performance function as shown in (2.25). 
It was then decided to derive a method tailored for this 
performance function. The derivation of this method is based 
on the generalized eigenvalue/eigenvector problem introduced 
by the stationary points of the performance function J in 
(2.25). The stationary point of* the performance function J 
in (2.25) satisfies (2.40) which can be written in the form: 

~XX ‘ 'ma^^* M* = ^ (2.109) 



where H* is the optimal filter vector which maximizes the 
performance function J. 

The optimal filter H* satisfies 



H* 



1 _ 

J 

max 





H* 



( 2 . 110 ) 



From equation (2,110), it is obvious that an adaptive 
filter designed by using the transform matrix j* * Rgg 
for updating the filter will satisfy (2.110) if it converges 
to the optimum. 
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In order to accelerate the convergence of such an 
adaptive filter, a gradient search is added to update the 
filter vector. The steepest descent search direction is 
adopted. The ’’best step” concept is used partially to com- 
pute the adaptation gain. 

At iteration step K+1, the filter update equation 
is described by; 



Mku 





Bss • «K " 



a • G 
K -K 



( 2 . 111 ) 



The transform matrix M„ is defined as: 

-K. 

Mk ^ 5 nn • 5 ss (2.112) 

Using (2.112) in (2.111), we obtain: 

-K+1 " ~K* -K * -K (2.113) 



The adaptation gain for the AT method can be obtained by 
Lemma 2.05. 



-K+1* -K ^ 



(2.57) 



From (2.83) and C2.98), the gradient ‘^K+l 



^K+1 



-K+l^NN-K+1 



^K+1 * -K+1 



(2.114) 
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Using (2.113) and (2.114) in (2.57), we obtain: 



T n u ’ §K+1 -K ^ °‘k 

-K+1 ~NN -K+1 



^Sk+1 °^kSk+i -K^ * -K ~ ° 



T 

G^)] .Gj. = 0 

(2.115) 

(2.116) 



(2.116) can be viewed as a dot product between two orthogonal 
vectors, and the expression can be modified to be; 

-K * ^5k+1*J^?K -K “K 9k+1 -K^ ^ ° (2.117) 



Solving (2.117) for aj^, we obtain: 



Uk 



• 9k+i • • Sk 

-K 9k+1 -K 



(2.118) 



Since known, we use Lemma (2.07) to approximate a 



K' 



a 



K 



-K 9k -K 
-K 9k -K 



(2.119) 



In order for the adaptation gain in (2.119) to be accept- 
able (i.e., the adaptive filter will converge), it must 
satisfy the condition (2.90). 

The adaptive filter designed by the AT method is 
carried out in the following steps; 

[Step 1] Select a starting filter vector H , and a stopping 

— o 

bound 6 . 
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[Step 2] Compute the performance function as in (2.25) 



, _ -K ?SS -K 

K H^R H 
-K ?NN -K 



[Step 3] Compute the gradient Gj, = Vj^ Jj^. 

— K 



-K ^ „T 



-K ?NN -K 



9k -K 



[Step 4] Compute the adaptation gain. 



“k ' 



"k 9k Sk 
£k 9k £k 



[Step 5] Update the filter vector Hj^ according to (2.113) 



-K+1 ~K -K “k -K 



[Step 6] Test for stopping condition. 

If II— K+1 — £ 6, then terminate the adaptation, 

IIMkII 

otherwise go to step 2. 

E. CONVERGENCE AND CONVERGENCE RATE OF THE GRADIENT METHODS 
1 . SD Adaptive Filter 

Theorem 2 . 09 

For any starting filter vector Hq, the sequence 
{H^} of the adaptive filter given by (2.56) converges to 
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the unique optimal solution H* given by (2.12). Further- 
more, the rate of convergence satisfies 



P 



A II -K+1 




H* II 3 



1 - C 
1 + C 



( 2 . 120 ) 



where C is the condition number of the Hessian matrix 
of the performance function J given in (2.10) and 3 is a 
constant. The condition number is defined as 





( 2 . 121 ) 



where Xg are the largest and smallest eigenvalues of 
the Hessian matrix R^^- 

Proof 



The Kantorovich inequality is used to prove the 

theorem. 



The functional is defined as: 



\ ^ ( 2 . 122 ) 

where R^^ is the Hessian matrix of the performance function 
in (2.10). For the adaptive filter R^^ is the correlation 
matrix of the observed image signal. 

H* is the filter which minimizes (2.10). The filter 
vector H* is called the optimal filter. T is updated at 
iteration step K + 1 as; 
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(2.123) 



Using (2,56) to substitute for (2.133) 



we have 



'^K+1 " ^-K °^K -K ■ ~XX*^-K °^K -K ■ 

Using the definition (2.122) and the fact that 
is a symmetric matrix, we obtain: 



(2.124) 



* 2“k £k • SxxfflK - a*) * 



The adaptation gain is given by (2.66), which can 

K 

be used in (2.125) to obtain: 






K+1 






1 

2 



cx; 



K 



a/Exx^K 



-K ~XX -K 



Tr + 2uj^ Gj, R^^(Hj,-H*) ■ 2 °^K ‘ -K -K 



(2.126) 



Using (2.63) and (2.12) in (2.126), we obtain: 



^ * ?xx^-k'-*^ ■ ^^~xx -K " -xs -xs ■ ?XX 



= "k 



(2.127) 
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Substitution of (2.127) in (2.126) gives: 



'^K ~ -!?*&■ 2 -K -K 



K ^ 2 -K-K 



(2.128) 



Let us define the vector Ej^ as 



E ^ K - H* 
-K = -K - 



(2.129) 



Using this definition (2.129) in (2.122), we obtain: 



'^K ■ -K ?XX -K 



(2.130) 



From (2.127), we obtain: 



-K ■ 2 * ~XX -K 



(2.131) 



Using (2.131) and the fact that is symmetric, (2.130) 

became : 

" 4 * -K ~XX -K (2.132) 



Using (2.132) in (2.128), we obtain: 



"^K+l ' ’^K 






K 



J 

1 p T n-1 P 
? * -K ^XX -K 



(2.133) 



Substitution of a,, given by (2.66) in (2.133) gives: 



'^K+1 " '^K 



«K iK 



2k £k 



'1' 



K 



-K ~XX -K 



-K ~XX -K 



(2.134) 
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Now the Kantorovich inequality is used: 



*^-K ~ XX -K^ ^-K ~ XX -K^ ^ 









(2.135) 



where \g and are the smallest and largest eigenvalues of 
the matrix 

Using (2.135) in (2.134), we obtain: 



'V 



K+1 ■ ’^K 



K 






(Xe + Ai) 



(2.136) 



'V 



K^l< (^k^) <1 



K 






(2.137) 



Again, the condition number of the matrix is defined as 



C ^ 



(2.138) 



(2.137) became 



't' 



Ell < (kl^)< 1 
fj, - '• 1 + c 



(2.139) 



Since is a positive definite matrix, the sequence 

is a positive sequence. 

Let us define 



q' ^ 1 1 



(2.140) 



we obtain 



"k.1 1 



(2.141) 
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From (2.141), we can see that when k the 



sequence converge to zero, 

a decreasing positive sequence 
= £ (use C2«130) to justify 
is defined as - H* in (2.129) 

H - H* = 0 or H 



The reason is that we have 
thus = 0. It implies 
this statement) . Since 
, we conclude that: 

= H* (2.142) 



This completed the proof of convergence. 

From (2.139), we observe that the rate of convergence 
of the sequence is given by (2.140). However, as 

defined in (2.122) is a quadratic function of the vector 
— K ~ — K " — * ’ satisfies the relation 



¥ 



K+1 

K 



H, 



- 



. II ax.i - a " II ^ 1^ ^ 

I|h,-h*||' 



(2.144) 



where g is a positive number. 
Thus we obtain: 



at 



P = 

Theorem 
least linear 



i|HK.i-a*ii 1 . 
"Hjj - H* II “ ® 

(^2.09) proves that 
convergence . 



1 - C 
1 + C 



the SD method 



(2.145) 
(Q.E.D. ) 

exhibits 



Definition 



P = 



An algorithm with the property that 

= constant is said to exhibit linear con- 



H - H* 
-K+1 - 



!1k ■ H* 



vergence , 
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The linear convergence is sometimes called geometric 



convergence since it follows from the definition that for 
large K, j 

- H* II 



Hk - H* II = 



The speed of convergence of the SD method is a function of 
the condition number C. The more ill-conditioned the 

slower will be the rate of convergence. 

Theorem (2.09) used the mMSE quadratic function. 

For the MSNR performance function, it was shown (lI.E.l) that 
the sequence generated by the SD method converges. Test 

results showed that the convergence of SD is slower for MSNR 
than mMSE. 

2 . ASP Adaptive Filter 

The algorithm is illustrated in Fig. 2.01. 




Fig. 2.01 The ASD algorithm. 
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The steepest descent steps are labeled SD, the accelerated 
steps are labeled ASD, 

Shah, Buehler and Kempthore [53] showed that for an 
n dimensional quadratic function, the sequence of iterates 
Hq, H 2 , is identical to the full sequence of iterates 

generated by the conjugate-gradient descent. Since the con- 
jugate gradient descent takes no more than n steps to reach 
the minimum of the n dimensional quadratic function, the 
accelerated steepest descent takes no more than (2n-l) steps. 

Applying the ASD method to design a multidimensional 
adaptive filter using real test screen images has shown poor 
convergence speed for both the mMSE and MSNR performance 
functions. The reason is due to error propagation. These 
methods are sensitive to error propagation, which do not 
satisfy the condition for accelerated convergence. 

3 . CGF Adaptive Filter 

The conjugate gradient methods CGP and CGF exhibit 
quadratic termination (apart from rounding errors) for the 
mMSE performance function. Quadratic termination means that 
for a quadratic performance function it is guaranteed that 
the minimum will be located exactly (apart from rounding 
errors) in no more than n steps. However, for nonquadratic 
functions like (2.25) the conjugate gradient method does not 
exhibit quadratic termination. For the infinite dimensional 
case, Daniel [60] showed that the rate of convergence is: 
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p 



< 



H - H* I 
-K+1 - I 



Hr - 






/As 



where x^> As largest and smallest eigenvalues of the 

Hessian matrix of the performance function J. 

Depending on the approach to design, the adaptive 
filter for nonlinear, nonquadratic performance function, 
different rates of convergence can be obtained. Some 
approaches exhibit quadratic convergence (those which approx- 
imate the performance function by a Taylor series expansion) . 
Others exhibit superlinear convergence. 



Theorem 2.10 

Let the performance function J be defined by (2.10) 
and the adaptive filter be designed using the conjugate grad- 
ient method, then the sequence of adaptive filters {H^.} con- 
verges in no more than n steps to the unique minimum H* of 
the performance function J. 



Proof 

The proof is based on the fact that both methods, 
CGF and CGP, are based on the conjugate direction search 
method which implies that the step direction vector is 
orthogonal to the gradient of the performance function J at 
iteration step K+1. This fact is stated as: [ 55 , 61 ] 



"k+1 • Vr = 0 



(2.146) 
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The adaptation equation is; 



Hk.i = (2.147) 

Its expression at the iteration step n can be related to all 
steps from iteration step K by: 

n- 1 

H = + E a. V. (2.148) 

-n -K+1 -1 

for any 0 £ K £ n - 1. 

The gradient of the performance function J at itera- 
tion step n is given by: 



G = 
— n 

By substituting 



2 (5xx Hn - -Sxs) 

(2.148) in ‘(2.149), we get: 



(2.149) 



^ R V 

^n = ^K +1 " ^ “j -J 

^ ^ j=K+l-' 



(2.150) 



Using equation (2.146) in (2.150), we obtain: 



fj. IT — 1 rp 

-K -n j=K+T^ “J 



(2.151) 



The method of conjugate gradient is based on generating a 
conjugate sequence of step direction vector {Vj,} . 

The conjugacy condition satisfies: 



-K ?XX -j " ° for K ^ j. (2.152) 

We use (2.152) in (2.151) to show that: 

vJ G„ = 0 (2.153) 

— jx — n 
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The step direction vectors V , V, , . . . V , form a 
complete conjugate basis. Therefore, at iteration step n, 
the only G which satisfies (2.153) is G = 

(2.154) 

But for the quadratic performance function, the gradient 
vanishes only at the minimum. So we proved that the method 
converges to the minimum of J in (2.10) in no more than n 
steps . 

Substituting (2.154) in (2.149), we obtain 

?XX -n ■ ?XS " ° (2.155) 



H 



— n 





(2.156) 



So 



H = H* 
— n — 



(2.157) 



Thus the filter converges to the unique minimum of J. 

Q . E . D . 

In practical applications, it was found that the 
conjugate gradient methods converge sometimes in more than 
n steps. The reason is the round-off error. The two condi- 
tions (2.146) and (2.152) are not satisfied, so the sequence 
{Vj^} of step directions does not form a complete basis in n 
iteration steps. 

For the MSNR cases, the adaptive filter could not 
converge as fast as in the mMSE cases because the performance 
function J in the MSNR is nonquadratic. 
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4. The DFP Method 



The variable metric DFP method exhibits quadratic 
termination, apart from rounding errors, for the mMSE per- 
formance function, 

Fletcher and Powell [58] proved for a general per- 
formance function J that a positive definite variable metric 
implies a positive definite , updated by (2.77). They 

showed that for a quadratic function like the mMSE type, 
successive filter updates AHq, ... form a set of 

conjugate directions, and , so the DFP algorithm 

exhibits quadratic termination in n steps. 

The MSNR performance function is nonquadratic and 
nonlinear, so the DFP method cannot exhibit quadratic ter- 
mination. But according to our test results, it is still a 
fast convergence technique. If the method converges slowly, 
it is recommended to restart the variable metric every n+1 
steps by setting = I , to overcome round-off errors. 

5 . The AT Adaptive Filter 

The Amir transform adaptive filter exhibits very 
fast convergence speed. The reason lies in the way it was 
designed. Each iterative step uses a transform to satisfy 
the generalized eigenvalue and eigenvector steady state equation. 



Theorem 2.11 

Let the adaptive filter be updated by (2.111) and the 
performance function defined by (2.25). Then the filter 
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converges to the unique optimal filter H*, if the adaptation 
gain satisfies condition (2,90), 

The adaptive filter Hj, is updated by (2.111) 

-K+1 ^ * ~SS * -K -K (2.111) 

Substituting (2.83) for in (2.111), and defining 

" -K ~NN -K (2.158) 



we obtain; 



1 - 1 

= 4 - Rmm Rcc ^ 

—K+1 ~NN ~SS — K ^ 



~NN^ -K 



Rearranging (2.159) gives: 



(2.159) 



H = ri- R‘l R + 
-K+1 '■J^ ~NN ~SS 



?NN Oj, ~NN ?SS ■ ^ * -K 



(2.160) 



Subtracting from both sides, we obtain: 



H 



-K+1 -K 



- = (I * 



2“k'^k 



?NN 5sS ■ P Sk (2.161) 



I is the identity matrix. 

Let us define the matrix Zj, as: 



Z = T 
~K :: 



^ K^K 
K 



-NN 



(2.162) 



Since Rgg are positive definite and a^,, 

are positive numbers, thus Zj^ is positive definite. 
°^K’ "^K’ ^K bounded, the norm of the matrix Zj^ is 



Jk> 

Since 

bounded. 
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In other words, there exists a positive number such that: 



II Jk II i ^ 

Taking the norm of (2,161) and using the inequality, 

II A • B II < II A ||.|| B II 

where A, B are matrices, and combining with (2.163), 
we obtain: 

II %+l ■ -K II - ^ * II J7 * ~NN ~SS ' i II * II M 

K 



which turns out to be 



IIMk^i-h II 

- ^ * llj^ * ~NN ?SS ' ^ II 






If the convergence error is defined as 



£k - 



II Hk.i - Sk II 

II II 






^ 5nn ?xx ■ i 



The largest eigenvalue of ^ Rgg - I is 



max 

■^K 



- 1 



where is the largest eigenvalue of Rj^j^ R^g. 



But 



II ^ • &NN ?SS ■ I II ' ^ 



- 1 



(2.163) 

(2.164) 

■K II 

(2.165) 

(2.166) 

(2.167) 

(2*168) 

(2.169) 

(2.170) 
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4 » 



So 



£k £ A 



J 

max 
J 



1 ) 



(2.171) 



The adaptation gain is designed to satisfy 
condition (2.90) which states that: 

*^K+1 - ^ ° 

Updating (2.171) and using condition (2.90), we have: 



I. 


^K+1 - ^ 


r max 

L T 

"^K+l 


1) 




II. 


£k < ^ • 


J 

r max 


1 ) 


(2.172) 


III. 


- ‘^K+1 









Thus, the sequence {cj^} converges to zero, because 

J is the maximum value of the unimodal performance 
max ^ 

function, and the sequence is an ascending sequence 

bounded by the upper bound J „ , 
so 



lim £ 
->00 



K 



A 

0 



lim 

K=oo 



J 

max 



1) = X • ( - 1 ) 

"^max 

(2.173) 



This proves that the filter converges. At the 
convergence point, (^2. 170) satisfies 



J ~NN ~SS ■ ^ 



= 0 



(2.174) 
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or, in the vector form 



Siii ?ss - P • !i. = ° 

(2.175) is the equation for the stationary points of the 
performance function J. Thus, = '^max correspondingly 

H = H*. 

— W — 

So the adaptive filter converges to the unique optimum. 

(Q.E.D.) 

F. PRESENTATION OF RESULTS 

1 . Organization of Results 

The performances of both mMSE and MSNR nonrecursive 
adaptive spatial filters have been extensively evaluated on 
two real world infrared images, shown in Fig. 2.1 and 2.2. 
Before the detailed presentation of these results, a detailed 
description is given of how the evaluations are organized. 

(a) Filter type: 

- Nonrecursive adaptive spatial filter 

- Search box (filter size) 3 by 3 pixels with 

the estimation pixel in the middle of the filter 

(b) Optimization criterion and performance function: 

- mMSE: Minimization of mean square error 

- MSNR: Maximization of signal to noise ratio 

(c) Adaptation equation: 

1. LMS approach: 

-K+1 ~ -K ^ 2yeX 
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Fig. 2.1 A 9 level 
computer print of 
Indiana infrared 
test image. 
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Fig. 2.2 A 9 level 
computer print of 
China Lake infrared 
test image. 
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2. Gradient approaches: 



SiKn " Hr * 



“K -K 



5. Conjugate gradient approaches 

-K+1 ” -K “k -K 
4. Variable metric approach: 



-K+1 " -K -K 



Amir's transform approach: 



-K+1 " -K “k -K 



(d) Search methods : 

1. LMS approach: 

Steepest descent method 

2. Gradient approaches: 

Steepest descent method 
Accelerated steepest descent method 
Amir's method (apply only to mMSE case) 

3. Conjugate gradient approaches: 

Fletcher-Reeves method 
Pollack method 

4. Variable metric approaches: 

Davidon-Fletcher-Powell method 

5. Amir's transform approach: 

Apply only to MSNR case 

(e) Test images used: 

1. Indiana image (Fig. 2.1): 

32 X 32 pixels 

Blue spike infrared spectral band 
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An image taken from a city in Indiana 
and used quite extensively as a standard 
test image for high altitude downward 
looking infrared surveillance system. 

2. China Lake image CFig* 2,2); 

32 X 32 pixels 

Thermal infrared band in 10-13 y range 

An image taken from a desert area in 
China Lake, California with a highway 
in the picture. It has been used as one 
of the standard test images for short 
• distance side looking infrared target 
acquisition system. 

(f) Performance evaluation: 

The performance of the adaptive filters is presented 
in four different ways, all as a function of the 
number of iterative steps N. 

1. Filter coefficients as a function of N. 

(9 coefficients for a 3 x 3 spatial filter) 

2. Output variance as a function of N. 

3. Processing gain as a function of N. The 
processing gain is defined as follows: 

2 2 
m. + a . 

PC = 10 log 



where m- , m^ = means of the input and 

^ filtered images respectively. 

2 2 

o- , On = variances of the input and 
^ filtered images respectively. 

4. Output signal to noise ratio (used only in 
MSNR cases) : 

Output SNR of the filtered image is defined 
as follows: 
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SNR 



0 



H 



T 




?SS 

5nn 



H 

H 



where H = the filter vector 



Rss ” target signal correlation matrix 
= clutter noise correlation matrix 

~NN 



2 . Results of mMSE Adaptive Spatial Filters 
I - Indiana Image 

The test results of adaptive filters based on the 
mMSE criterion and using Indiana test image are presented 
in the following figures: 



Fig . 2.3 
Fig. 2.4 
Fig. 2.5 

Fig . 2.6 
Fig. 2.7 

Fig . 2.8 

Fig. 2.9 



LMS approach, steepest descent method 

Gradient approach, steepest descent method 

Gradient approach, accelerated steepest 
descent method 

Gradient approach, Amir's method 

Conjugate gradient approach, Fletcher-Reeves 
method 

Conjugate gradient approach. Pollack 
method 

Variable metric approach, Davidon-Fletcher- 
Powell method. 



In each figure three results - the nine filter 
coefficients, output variance and processing gain - are 
presented as a function of iteration steps. 
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LMS Algorithm 
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Fig . 2.4a 



ITERflTIOM I 




JTERflTKJI » 



Fig. 2.4b 



Fig . 2.4c 
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Accelerated Steepest Descent - mMSE 
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Fletcher-Reeves Method - mMSE 
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Blue Spike Indiana Infrared Image 
32x32 pixels 

Before Filtering: Mean = 3.3039 , Variance = 0.74111 

*These results are exactly the same as that of the optimum 
mMSE filter. 



The following additional numerical results are 
presented in Table II-l: 

- Processing gain 

- Mean of the filtered image 

- Variance of the filtered image 

- Number of iteration steps to go below the prescribed 
error 

- Actual adaptation error when the adaptation is 
stopped. 

a. Discussion 

These results will be discussed in several groups. 
(1) LMS Approach and Steepest Descent Method. 

This approach is the two dimensional extension of the most 
widely used adaptive filter technique. In Fig. 2.3, we can 
see that as the adaptation took close to one thousand steps 
to reach the minimum of the output variance and the maximum 
of the processing gain. However, the adaptation never 
achieved a steady state, even up to 10,000 steps of iteration. 

Further, there is a steady state deviation 
from the optimum output variance. It is known as the "mis- 
adjustment” which commonly occurs in the traditional adaptive 
filter approach (2 3) . 

We believe these problems are the consequences 

of the basic assumptions of this LMS algorithm. The reasons 

probably are not obvious if we follow the traditional adaptive 

concept which was initiated by Prof. Widrow using the error 

T 

signal concept in control, c = H X - d. The filtered output 
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I 



p. 



T 

H X is compared with a desirable result, d. Their difference, 
£, is used together with a constant, but adjustable, adaptive 
gain, 2y , to form a correction term, AH, for the filter 
coefficients to approach the optimization goal, which is the 
minimization of mean square error. 

On the other hand, if we consider the 
adaptation procedure as an optimization process, then, the 
adaptation equation takes the form of 



“ Hk ° Hk ^ “k 

where Gj^ is called the "gradient," is called the "step 
size." The concept of gradient means the gradient of the 
performance function surface, J. The product of adaptation 
step size and the gradient is the correction term AH.* 

It is postulated that the assumptions made 
by the LMS approach are not directly responsive to the goal 
of adaptation because the error term H X - d is not directly 
related to the minimization of the performance function. 
Further, the assumption that the adaptive gain 2y , which 
corresponds to the concept of step size in optimization, is 
constant, does not coincide with the fact that the iterative 
steps toward optimization usually take place in varying step 
sizes. These problems contributed to the slow convergence, 
and the steady state misadjustment in the LSM adaptive 
spatial filters. 
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We developed several adaptive filters using 
gradient methods developed in the optimization field. Their 
results are discussed in the following. 

(2) Gradient Approaches. Three different methods 
were developed. Their results are shown in Figures 2.4, 2.5, 
and 2.6 for the steepest descent (SD) , accelerated steepest 
descent (ASD) and Amir’s (AMM) methods respectively. 

The reasoning described above is quite 
convincingly supported by the following observations: 

a. The convergence of adaptation is faster. It took 541, 

445, and 220 steps for the SD, ASD and AMM methods 
to reach the stopping condition of adaptation less 
than 1.5 x 10‘H as shown in Table II-l. 

b. The adaptation procedure indeed reached steady state 
once the adaptation error is less than the stopping 
condition. 

c. The steady state error is smaller than that of the LMS 
algorithm as shown in Table II-l. In fact, the output 
variance is equal to that of the optimum filter. 

(3) Conjugate Gradient Approaches . Two differ- 
ent methods were developed. Their results are shown in 
Figures 2.7 and 2.8 for the Fletcher- Reeves (CGF) and the 
Pollack (CGP) methods respectively. 

Again-, the improvements are clearly seen. 

In fact, they are even better than the gradient methods. The 
convergence took only 66 and 10 steps for CGF and CGP methods 
to reach below the stopping condition of 1.5 x 10 At the 

same time, the output variance is the same. 

(4) Variable Metric Approach . Results of this 
approach, which is extended from the one dimensional work of 
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Davidon-Fletcher-Powell are shown in Fig. 2.9. Again, the 
improvements are clearly seen. The background suppression 
result is the same measured by the output variance and 
processing gain. But the convergence speed is even better 
and took only 9 iteration steps to reach below the stopping 
condition . 

3 . Results of mMSE Adaptive Spatial Filter II - 

China Lake Images 

The test results of adaptive filters based on the mMSE 
criterion and using the China Lake test image are presented 
in the following figures: 

Fig. 2.10 - LMS approach, steepest descent method 

Fig. 2.11 - Gradient approach, steepest descent method 

Fig. 2.12 - Gradient approach, accelerated steepest 
descent method 

Fig. 2.13 - Gradient approach, Amir’s method 

Fig. 2.14 - Conjugate gradient approach, Fletcher- 
Reeves method 

Fig. 2.15 - Conjugate gradient approach. Pollack method 

Fig. 2.16 - Variable metric approach, Davidon-Fletcher- 
Powell method 

In each figure, three results are presented as 
functions of iteration steps: filter coefficients, output 

variance and processing gain. 

Further, additional results are summarized and pre- 
sented in Table II-2: 

- Processing gain 

- Mean of the filtered image 
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- Variance of the filtered image 



- Number of iteration steps to go below the 
prescribed stopping error 

- Actual adaptation error when the adaptation 
is stopped. 

a. Discussion 

The results, using the China Lake image, are 
generally similar to that using the Indiana image. Only 
the important features will be summarized below. 

(1) LMS Approaches . The adaptation based on 

the LMS approach again show three problems: slow conver- 

gence, never reached steady state, and misadjustment . 

(2) New Approaches Developed in this Thesis . 

All new approaches achieve the same steady state performance 
equal to that of the optimum filter as shown in Table I I. 2: 

Mean of the filtered image = 6.495 * 10 ^ 

- 2 

Variance of the filtered image = 1.2 ♦ 10 

However, they converge to the steady state value with much 
less numbers of steps, as shown in Table II. 2 also. 

Therefore, test results on the China Lake 
image again demonstrated the improvements in adaptive fil- 
ters using the approaches suggested in this thesis. 

It is interesting to note that the effec- 
tiveness of background clutter suppression in the case of 
the China Lake image are not as good as that in the case of 
the Indiana image. For example, the processing gain for 
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the China Lake image is (19.32) db compared with (29.874) db 
for the Indiana image. We believe this difference is related 
to the spatial correlation of the image. The higher the 
correlation, the better is the background clutter suppression. 
The Indiana image is more spatially correlated than the China 
Lake image. 
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Steepest Descent - mMSE 




Fig. 2.11a 




Fig. 2.11b 



Fig . 2.11c 
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Amir's Method - mMSE 
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Pollack Method - mMSE 
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Davidon-Fletcher-Powell Method 
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4 . Results of MSNR Adaptive Spatial Filter I - 

Indiana Image 

The test results of MSNR adaptive spatial filters 
using Indiana test image are presented in the following 
figures . 

Fig. 2.17 - Gradient approach, steepest descent method 

Fig. 2.18 - Gradient approach, accelerated steepest 
descent approach 

Fig. 2.19 - Conjugate gradient approach, Fletcher-Reeves 
method 

Fig. 2.20 - Conjugate gradient approach. Pollack method 

Fig. 2.21 - Variable metric approach, Davidon- Fletcher- 
Powell method 

Fig. 2.22 - Amir's transform approach. 

In each figure, four results are presented as func- 
tions of iteration steps: filter coefficients, output var- 

iance, processing gain and output signal to noise ratio. 

Further, additional numerical results are summarized 
and presented in Table II-3. 

Output signal to noise ratio 

Processing gain 

Mean of filtered image 

Variance of filtered image 

Number of iteration steps to reach below 

the prescribed stopping error 

Actual adaptation error. 

Discussion : 

a. In the mMSE adaptive filter study, we first 
presented the results of adaptive filter design by the LMS 
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algorithm because it is the most frequently used method. We 
extended it to two dimensions and used it as a benchmark for 
comparison. For the MSNR criterion, we have not yet found 
any past study of adaptive filter using this method. There- 
fore, comparisons of convergence results are based on several 
methods developed in this thesis study. 

b. However, we can compare the background clutter 
suppression results - of the mMSE and MSNR adaptive filters. 
For point targets, their steady state filter coefficients are 
the same if the coefficient of the estimation pixel are all 
normalized to unity. Therefore, the statistical properties 
of the filtered image are the same, i.e., the error variance 
and the mean of the image after processing by the two types 
of filters are identical. For the Indiana image, the mean 
and variance of the unfiltered and filtered images are. 

Before filtering After filtering 
mean 3.30397 0.00006495 

variance 0.74111 0.012 

c. The convergence speeds are different, as shown 

in Table II. 3. For a stopping condition of 10 the num- 

bers of iteration steps to reach below this condition are: 



SD = 739 


CGF = 76 


DFP 


ASD = 739 


CGP = 76 


AT 
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Fig. 2.17a Steepest Descent Method - MSNR 
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Fig. 2.17c Steepest Descent Method - MSNR 
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Fig. 2.18a Accelerated Steepest Descent - MSNR 
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Fig. 2.22a Amiris Transform Method - MSNR 
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Table II-3 Results of MSNR Adaptive Spatial Filter (Indiana Image) 
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The same trend in mMSE cases is found for the 
MSNR cases. The variable metric method (DFP) is faster than 
the conjugate gradient methods (CGF, CGP) which are faster 
than the gradient methods (SD, ASD) . 

It is important to point out that the transform 
method (AT) which does not have a corresponding method in 
the mMSE cases has the fastest convergence speed. It took 
only two steps compared with the twenty-five steps required 
for the variable metric method to reach below the stopping 
condition. 

5 . Results of MSNR Adaptive Spatial Filters 
11 - China Lake Ima^ 

Test results of MSNR adaptive spatial filters using 
the China Lake test image are presented in the following 



figures 
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- Gradient approach, 
method 


steepest descent 
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- Gradient approach, 
descent method 


accelerated steepest 
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- Conjugate gradient 
Reeves method 


approach, Fletcher- 
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- Conjugate gradient 
method 


approach. Pollack 
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- Variable metric approach, Davidon- 
Fletcher-Powell method 


Fig. 
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- Amir's transform method. 



Several numerical results are presented in Table II. 4 

Output signal to noise ratio 
Processing gain 
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Mean of filtered image 

Variance of filtered image 

Number of iteration steps to reach below 
the prescribed stopping error 

Actual adaptation error. 

Discussion 

a. Gradient approaches have not been included in 
these presentations because their convergence speeds are not 
as fast as the conjugate gradient, variable metric and Amir's 
transform methods. 

b. Again, the Amir transform method has the fastest 
convergence speed. It only took three steps to reach below 
the stopping condition compared with fifteen steps required 
by the next fastest method, the variable metric method. 

c. Based on the experience using the Indiana image 
and the China Lake image, the comparative behaviors among 
these methods are similar. 
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Fig. 2.25a Steepest Descent MftthnH - m.<;mr 
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Fig. 2.24c Accelerated Steepest. Descent Method - MSIvjr 
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Fig. 2.25a Fletcher-Reeves Method - MSNR 
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Fig. 2 . 27d .. Davidba--Pletche]?-- -Powell Method - MSN R 
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Variance = 193.464 



III. THE MULTIPLE MICROCOMPUTER SYSTEM 



A. INTRODUCTION 
1 . General 

Signal processing algorithms are usually developed 
on main frame computers. The transfer of these algorithms 
to on-board processors in practical systems is, in general, 
not an easy task because there are many constraints in real 
systems such as the processing speed, weight, volume, power, 
fault tolerance and others. This thesis undertook both the 
theoretical development task and the practical implementation 
investigation. Specifically, this chapter will present the 
second part of this thesis which considers the implementation 
of adaptive image processing algorithms developed in the last 
chapter by a multiple microcomputer system using concurrent 
parallel and pipeline processing. 

It is important to point out that the digital computer 
is not the only technique for real time implementation. De- 
pending on the amount and rate of signal data; precision and 
dynamic range requirements; need of programmability and sev- 
eral oth er factors, different approaches of signal formats, 
device technologies, signal/data processor architectures 
should be considered. In many cases, combinations of analog, 
sampled analog and digital processing approaches using optical. 
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electronic and acoustical devices probably will offer cost- 
effective and optimum performance [178-180]. However, with 
the rapid advances of VLSI/VHSIC technologies in both in- 
creasing speed and decreasing power, size and cost, the 
importance of digital electronic implementation in the form 
of distributed processing using multiple processors has been 
increasing at a rapid rate and will undoubtedly play a more 
and more important role in real on-board implementation. 

This part of the thesis is to investigate and develop the 
feasibility and potential of using multiple microcomputer 
systems for real implementation. 

2 . Multiple Processor Developments 

Multiple microcomputer systems are a subset of larger 
families of multiple processor systems whose developments were 
started over twenty years ago. It was obvious for a long time 
that several processors are better than one. However, how 
should they be connected together and effectively used has 
not been obvious at all. The answer depends on many factors. 
First, what is the objective? Is it real time processing, 
fault tolerance, multiple users, security, or some combina- 
tions of these? Second, what are the available technologies 
in both hardware and software? Third, what are the con- 
straints in cost, weight, volume, development time, available 
manpower? The answers have been very different depending on 
many of these factors. We can identify several major areas 
of multiple processor developments since the early 1960's. 



152 



a. Supercomputers [151, 152] 

The first area can be generally called the ''super- 
computers.” Several processors were connected in different 
ways to offer parallel processing [ISS-ISS] , pipeline process- 
ing [156-158J or combined parallel/pipeline processing capa- 
bilities, In some cases, specially designed signal processors, 
called array processors, are connected to a host computer to 
offer very fast data crunching capabilities. In most of these 
cases, the basic processing elements to form the multiple 
processor systems are special arithmatic or signal processing 
units, not stand-alone computers. Their inter-communications 
and the signal flow are usually fixed in the design stage to 
achieve very fast computing speed and are not changed during 
operation. Several representative systems are listed in 
Table III-l. Their common objective is "fast computation" 
and "high throughput.” The processing elements are tightly 
coupled. 

b. Computer Networks [161, 162] 

The second area can be generally called the 
"computer network.” Several processors are connected to- 
gether for intercommunication and resource sharing. The 
basic processing elements are mainly stand-alone computers. 

A problem is usually not partitioned and performed concur- 
rently on several processing elements. The system is, in 
general, loosely coupled. The communication is carried out 
by messages with appropriate synchronization codes at the 

beginning and the ending of the message. 
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c. Ultra-Reliable Fault Tolerant or Highly 
Available, Graceful Degrading Computers 
[166, 167J 

The third area can be generally called "Fault 
Tolerant or Highly Available" computers. Multiple process- 
ing elements have been connected in different ways to offer 
either fail-soft, fail-safe or graceful degradation capabil- 
ities. In most fault tolerant computers, the redundancy 
and/or sparing are usually made at the building block levels, 
such as the CPU, RAM, I/O ports, etc. to make a very reliable 
and fault tolerant single computer [168]. The intercommuni- 
cations among the elements are generally fixed. In recent 
years, because of the steady decrease of the cost of a com- 
puter, the basic processing elements in a multiple processor 
system are a small number of stand-alone computers [169-171] . 
These systems started a new direction in the multiple processor 
developments because the intercommunications among the process- 
ing elements are no longer fixed. The processing tasks can 
be flexibly assigned to different processors. This dynamic 
assignment, or allocation capability, also allows a new system/ 
software approach to fault tolerance and fault repair. 

3 . Multiple Microcomputer System Developments 

The rapid advance of low cost and small microcomputers 
has extended the development described above into a new dimen- 
sion because a large number of microcomputers, instead of 
just a few, can conceivably be interconnected into a system. 

Not only can its fault tolerance capability be further 
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increased, the computational or signal processing capability 
can also be much enhanced by providing concurrent parallel 
and pipeline processing capabilities. 

The beginning of the multiple minicomputer system 
development was started at the Carnegie Mellon University 
in their Cmmp system [172] , Although it used PDP-11 mini- 
computers, its tightly coupled architecture and dynamic memory 
allocation concept allowed a relatively large number of pro- 
cessing elements to join together into a single system. This 
development was soon followed by a tightly coupled multiple 
microcomputer project, CM* [173], also at Carnegie Mellon 
University. Since that time, several tightly coupled systems 
have been proposed [174 to 183] . Some of them have gone be- 
yond the conceptualization stage and started serious hardware/ 
software development efforts. However, none has reached the 
operational stage at this writing. 

At the same time, another direction of multiple micro- 
computer development has been pursued toward the "computer 
network" objective [184-188]. These systems can be distin- 
guished from the developments described above in the following 
major aspects : • 

® Different types of processing elements are used. 

In other words, they are "heterogeneous." 

® The processing elements are loosely coupled. 

° The bandwidth of the intercommunication buses is 
relatively low. 
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4. This Thesis Research 



The second part of this thesis research is to develop 
a multiple microcomputer system and to investigate its feasi- 
bility in implementing real time on-board signal/data process- 
ing for a smart sensor system. It is similar to a number of 
multiple microcomputer systems in development in the past 
three to four years which permit up to 16 microcomputers to 
be interconnected in some ways to perform computations. 

However, their objectives, architectures, intercommunication 
concepts, controllers, hardware buses and processing elements, 
software operating system, etc. are quite different. 

This thesis project is presented by highlighting the 
following features: 

a. Its objectives are to provide a multiple tasking 
system including fast image/signal processing capability and 
other more moderate speed but highly reliable signal/data 
processing capability for system management, command and 
control . 

b. Some of the signal/data processing tasks will be 
performed by tightly coupled processors. But the processors 
performing other tasks do not have to be all tightly coupled 
together. Therefore, a mixed tightly and loosely coupled 
system is envisioned. 

c. A part of the system must perform some critical 
tasks which require ultra-reliability. Other parts of the sys- 
tem only require fail-soft and graceful degradation performance. 
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In any event, a dynamic allocation capability is required 
which allows flexible assignment of microcomputers to perform 
various tasks, which provides some fault tolerance. 

For these requirements, a multiple star/multiple 
cluster system of 16 bit microcomputers was developed. Its 
general concept and philosophy was developed by a top-down 
system design procedure which will be presented in the next 
Section, III.B. It will be explained how a choice was made 
considering several alternatives and seven important issues 
related to the system. In Section III.C, detailed implemen- 
tation of these choices will be presented by describing the 
principles and circuits of this multiple microcomputer system 
in five categories: 

System architecture . * 

Processing resources 
Intercommunication network 
Intercommunication procedures 
Multibus communication. 

The performance of this system is described in Section III.E. 

B. DESIGN CONSIDERATIONS FOR THIS MULTIPLE 
MICROCOMPUTER SYSTEM 

1 . Introduction 

Although only two large multiple microcomputer systems 
and one multiple minicomputer system have appeared in the 
literature and reached operational status, a large number of 
different architectures have been proposed and some are in 
the process of being implemented. The three operational 
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systems are all from the Carnegie-Mellon University: CM* 

[172], Cvmp [191] and Cmmp [173], There are now many options 
for the hardware and software design of a multiple microcom- 
puter system. 

This thesis took a top-down system design approach 
to reach the choices made for the design of our system. This 
design process is presented in several steps in this section 
to explain the general idea and philosophy of this system. 

In the next section, the detailed design of various parts 
will be described, 

2 , Architecture 

This thesis is primarily concerned with the imple- 
mentation of adaptive image processing. It is important, 
however, to realize that the adaptive filter is only one part 
of a longer end-to-end image processing program for detecting, 
tracking and recognizing targets in noisy images. The adap- 
tive spatial filter is used to enhance the target signal to 
noise ratio by suppressing the background clutter which may 
be enhanced by additional image processing techniques, such 
as the adaptive temporal filters. The clutter suppression 
stage is followed by thresholding, target acquisition, 
recognition and tracking stages. These signal processing 
operations are quite different. For example, adaptive spa- 
tial filters require the computation of statistical image 
characteristics, solving matrix equations. Adaptive threshold- 
ing requires the comparison and rearrangement of real numbers. 
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Target acquisition usually involves pattern tests of numbers 
based on spatial, temporal and/or spectral information. 
Therefore, although each individual signal processing stage 
requires real-time or fast execution speed, different signal 
processing stages do not depend on one another during process- 
ing. Furthermore, it is important to realize that processing 
of target signals for the mission objective is only one part, 
although a very important part, of the total signal/data pro- 
cessing requirements for the whole system. There are 
processing functions such as management, control, communica- 
tion and others which must also be implemented. The nature 
and requirements of their processing operations are quite 
different and vary over a wide range. Some do not need high 
processing speed but demand very high reliability. Others 
do limited computation but handle large amounts of data. In 
general, the signal/data processing requirements of many 
systems cover a wide range. Therefore, we designed an archi- 
tecture which has several levels of coupling among processing 
elements . 

At the first level, special processors may be directly 
coupled to a microcomputer. At the second higher level, sev- 
eral microcomputers are connected to the same system bus in 
parallel and form a ''cluster." A microcomputer can communi- 
cate with any other microcomputer on the same bus or within 
the same cluster directly through common memory. It is a tight- 
ly coupled, bus oriented multiple microcomputer architecture. 
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At a higher level, the third level, four clusters are con- 
nected b/ way of a ''complete star” bus switch network and 
form a "star,” The communication of microcomputers between 
two clusters is accomplished by way of the switch network. 
Therefore, they are not as tightly coupled as microcomputers 
within a cluster because there will be more overhead in 
intercluster communication than intracluster communication. 
However, it was found that using specially designed control- 
lers for the intercluster communication, the access time was 
increased by only 91. This data is presented in Section III.E 
Therefore, we can consider that microcomputers in different 
clusters within the same star are still tightly coupled. At 
the next higher level, the fourth level, several "stars” are 
connected together by linking nearest neighboring "stars” 
through a bus switch to form a "lattice network.” The inter- 
communication between microcomputers from two stars are sim- 
ilar to that within a star, involving one central controller 
and two distributed controllers. The overhead is practically 
the same. Therefore, from the intercommunication viewpoint, 
microcomputers from two stars, and also throughout the systems 
are practically all tightly coupled. However, through pro- 
gramming, they may be used either in tight coupling, loose 
coupling or any combinations in between to suit the require- 
ments of the applications. 
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3. Intercommunication and Control 



Because of the hierarchical structure of the archi- 
tecture, the intercommunication processes and their controls 
are also hierarchical and are distributed. They are hier- 
archical because there are three levels of controls as shown 
in Table III.l. 

At the lowest level of intracluster communication, 
no bus switch is needed. A Random Priority Controller (RPC) 
is used for arbitration. Only a small portion of the dis- 
tributed controller is used, mainly to check if requests 
outside the cluster have been granted. At the next higher 
level of intercluster communication, the intrastar bus switch 
is used. Arbitration is accomplished by both distributed 
controller and RPC. Only a portion of central controller is 
used to grant the intercluster request. At the highest level, 
both interstar and intrastar bus switches may be used and all 
controllers, central, distributed and random priority, are in 
full action. 

Further, the controllers are distributed because 
there are four identical RPC and distributed controllers, 
one in each cluster. Although there is only one central con- 
troller, it consists of four identical units, one for each 
cluster. The advantages of such a distributed control system 
are: (1) Parallel control actions which enhance the speed of 

"request arbitration." (2) Improved fault tolerance because 
the control actions are shared between four separate units 
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Table III.l Componensts Used In Distributed Intercommunication and Control 
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and should one malfunction, the other three can still con- 
tinue their functions. 

4 . Hardware Implementation of Controllers 
Controller circuits can be implemented in several 

ways : 

a. Microprocessor control 

b. Bit slice processor control 

c. Digital logic circuit control. 

Two performance characteristics should be considered in their 
choice and design: programmability and speed. The micropro- 

cessor approach has the most versatile programmability but 
the slowest speed. The digital hardware approach has very 
limited programmability but the fastest speed of the three. 

An estimate has been made to compare their speeds. 

In our design, the primary goal is to offer the fast- 
est response and arbitration of requests and communication 
speed. Therefore, we chose the digital logic circuit approach. 
Great care was given to the design of controller concepts and 
circuits, to avoid unexpected changes. Further, Schottky and 
low power Schottky chips are used due to their speed and power 
trade-offs. CMOS chips were found to be too slow and do not 
have adequate driving capability. 

5. Priority Resolver 

There are several ways to arbitrate multiple requests 
or to resolve priorities: 
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Serial (Daisy chain) 
Parallel 



Fixed priority 
Rotating priority 
FIFO 

Random priority 

There are two primary requirements for a priority 
resolver circuit: uniform and fast resolution of bus re- 

quests. In this system, an Intel Multibus is used as the 
system bus with 10 MHZ bus clock frequency. We decided to 
design a priority resolver circuit to arbitrate 8 SBCs within 
one bus clock. 

The fixed priority approach was not selected because 
it was unable to arbitrate multiple bus requests and grant 
their usages uniformly. Test results showed that in our 
tightly coupled environments, two SBCs are able to share the 
bus adequately. More than two SBCs produce unacceptable 
delays . 

Rotating priority is much faster than the fixed pri- 
ority approach. It is able to arbitrate multiple requests 
and does grant their bus usages uniformly. However, it was 
not our final choice because the random priority approach was 
found to be faster. This is because in the rotating priority 
approach, every bus request line is tested serially (in a 
rotating manner) whether there are request signals on these 
lines or not. In the worst case, the rotating priority re- 
solver grants the bus after N searches, where N is the number 
of SBCs being arbitrated by the resolver. 
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First in-first out (FIFO) is a resolver approach 
which requires memory. Because of the time needed to refer- 
ence the memory, it is not possible to build a FIFO resolver 
to arbitrate 8 bus requests within 100 nsec, the bus clock 
period. With current technology, a fast FIFO arbiter 
probably requires more than 300 nsec. 

The random priority resolver is designed based on 
the binary tree synchronous selector concept. Consider our 
case of 8 SBCs in a cluster. Three-stage selection is used. 
During the first stage, four out of eight lines are checked 
simultaneously. In the second stage, two out of these four 
lines are checked simultaneously again. The final bus grant 
is made in the third stage. In other words, the time for 
searching and resolving the bus requests is log 2 N, which is 
faster than the rotating priority resolver. Test results have 
shown that the random priority resolver is able to arbitrate 
multiple bus requests and grant their bus usages uniformly as 
demonstrated in Fig. 3.17. Four SBCs simultaneously sharing 
the bus in a tightly coupled environment are taken for the 
test case. These four SBCs were programmed to request the 
bus usage almost 100% of the time. The BPRN signals of these 
four SBCs are shown. A low signal of BPRN indicates that its 
SBC is using the system bus. The fact that none of these 
four traces showed any long periods of bus usage or bus wait- 
ing demonstrated that the random priority resolver is able to 
arbitrate very heavy bus requests by these four SBCs and grant 
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bus usage to them "uniformly." It was found that, on the 
average, a '’bus request" is granted in about 60 nsec. 

6. Bus Switches 

Bus switches are one of the most important parts of 
a multiple microcomputer system because they provide the in- 
terconnection means among the processing resources. There 
are two aspects of the "bus switch" problem: bus switch 

network and the individual bus switch link. 

Many switch networks have been investigated, some 
predated the computer developments [195] . A small number 
of them have been considered in the multiple microcomputer 
development: cross-bar, banyan, hyperconcentrator, simple 

ring, etc. 

A combined approach was selected including two levels 
of switching networks because of the consideration of multi- 
task signal/data processing requirements in a typical system. 
At the higher level, many stars are interconnected in a 
lattice architecture. Interstar bus switches are provided 
between neighboring nodes. At the lower level, four clusters 
are included in each "star" node. They are interconnected by 
a "complete star bus switch" network. The complete star 
switch is chosen for two reasons. First, the coupling within 
a star should be as tight as possible. The complete star 
switch allows us to connect two clusters by the shortest link. 
Second, if a link failed, the complete switch gives us two 
choices to connect two clusters by way of a third cluster via 

two links, thus providing some fault tolerance. 
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The important part of the individual bus switch link 
is the switches themselves. For the Intel Multibus, we found 
that 58 of the 86 lines should be switched. There are several 
choices for the switches : 

Bidirectional: MOS types of switches, such as CMOS, VMOS 

and DM0 S. 

Unidirectional: Bipolar types such as Schottky, low 

power Schottky and ECL types; Optoelectronic types. 

Optoelectronic types of switches were not chosen because 
they are slow, on the order of 10 ysec. Very fast switching 
speeds on the order of several tens of nanosec are required 
because today's Multibus is running at 10 MHZ which corre- 
sponds to a clock period of only 100 nsec. CMOS, VMOS and 
DMOS switches could provide such switching speeds. However, 
they do not have enough driving capabilities for the 15 ma 
or more required by many of the control and address signals 
of the microcomputer. Therefore, these MOS switches were not 
chosen, although their bidirectional feature and the low power 
characteristics of the CMOS switches are extremely attractive 
and reliable. We chose the low power Schottky switches be- 
cause of their speed and driving capability. A typical per- 
formance is shown in Fig. 3.15 which shows the waveform of an 
address signal before and after the switch. It can be seen 
that not only is the delay short, on the order of 25 nsec, 
but also the waveform is improved by the switch because of 
its good driving capability of up to 50 ma. It was tested 
with a minimum load resistor of 50 ohms and maximum capacities 
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of 270 pf and the switch continued to function satisfactorily 
up to 45 MHZ, One disadvantage is the need to use two back- 
to-back switch circuits for a bidirectional switching of each 
signal. Therefore, a special circuit was designed to provide 
not only the "enable" signal but also the "direction." 

7 . Processing Elements 

There are two major types of processing elements on 
the system bus: general purpose microcomputers and special 

purpose processors which can further be separated into two 
subcategories. One is a special purpose processor like an 
array processor which can perform several signal processing 
operations such as fast Fourier transform, correlation, convo- 
lution, finite impulse filtering, infinite impulse filtering, 
etc. The second type is a special purpose processor which is 
designed to perform only one signal processing operation such 
as FFT. 

a. General Purpose Microcomputer 

It was decided that all general purpose microcom- 
puters used in our system should be treated homogeneously. 

This is necessary because two major principles of our operat- 
ing system are based on the "virtual processor" [189] and 
"dynamic process allocation" [190] concepts which require 
homogeneous processing elements. 

b. Special Purpose Processors 

It was decided that special purpose processors 
could not be treated in the same way as the microcomputers. 
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However, it has not been decided at this time exactly how 
these special purpose processors should be handled. There 
are two important alternatives. In one case, a special pur- 
pose processor is treated as an I/O port managed by the 
operating system. In the other case, a special purpose pro- 
cessor can be operated in a ’’slave" mode on the system bus. 

8 . Mode of Data Transfer 

The basic mode of data transfer in most of the mul- 
tiple processor systems is based on the ’’message transfer" 
communication. However, a basic philosophy of our operating 
system is the "loop free" structure which requires frequent 
synchronization primitive references. In other words, the 

operating system program on a microcomputer needs to refer- 

• « 

ence synchronization primitives located in either internal 
or external global memories. These "references" are executed 
via the system bus. If the data transfer is "message" based, 
the synchronization of processes could be delayed because 
the system bus is being occupied by a long message transfer. 

In order to avoid such a delay, it was decided that the basic 
mode of data transfer should be based on the "word transfer." 
This allows several microcomputers to reference their synchro- 
nization primitives and other data in an "interleave" mode. 

However, the transfer of data in "blocks" is possible 
if required. This is accomplished by a special feature of 
the Intel 16 bit 8086 microprocessor which can generate a bus 
lock signal of a duration specified by software. This bus 
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lock signal holds the bus for the completion of the block 
transfer. Thus, data transfer by "messages based communica- 
tion" is possible as well, 

C. DESCRIPTION OF THIS MULTIPLE MICROCOMPUTER SYSTEM 

1 . Introduction 

In the last section, we have presented the reasons 

for choosing the specific approaches for various parts of 

our multiple microcomputer system based on a top-down design 

procedure to meet the requirements of this type of smart 

sensor systems. In this section, more detailed description 

will be given to explain how those choices are implemented. 

The presentation will be made in five major categories: 

System architecture (Section C.2) 

Processing resources (Section C.3) 

Intercommunication network (Section C.4) 

Intercommunication procedures among resources 
in different clusters and stars (Section C.5) 

Multibus communication (Section C.6) 

Performance of this multiple microcomputer system 
will be presented in Section D. 

2 . System Architecture 

The topology of this system consists of many "star" 
nodes interconnected by links to nearest neighbor stars. A 
two dimensional example is shown in Fig. 3.1. Each star has 
four links connected to its four neighbors. The links are 
bidirectional system buses with a bus switch, called 
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Figure 3.1 Two Dimensional Lattice Architecture of 

Multiple Star Multiple Microcomputer System 



"inter-star bus switch" (ISBSW) . The "bus switch" consists 
of 60 bidirectional switches for 60 signal lines. Two types 
of switches have been investigated; one with latches and 
one without latches for the signal lines. 

Each "star" consists of four clusters interconnected 
by a complete star "bus-switch network." Each " cluster " 
consists of up to eight microcomputers. Other processing 
elements and one or more RAM boards are also connected onto 
the system Multibus. Fig. 3.2 depicts the topology of a 
single star with four clusters. In this example, the bus 
switch network consists of six bidirectional system buses, 
each with a bus switch interconnected as shown in Fig. 3.7. 

3. Processing Resources 

Two types of processing resources are used in this 

system. 

a. Basic Processing Elements - SBC 8612A 

Intel's 16 bit single board microcomputers, SBC 
8612A, are used as the basic processing elements. A block 
diagram of the SBC 8612A is shown in Fig. 3.3. 

(1) The Single Board Microcomputer SBC-8612A . 

The iSBC 8612A Single Board Computer is a 16 bit single board 
computer, a complete computer system on a single printed cir- 
cuit assembly. The iSBC 8612A board includes a 16 bit central 
processing unit (CPU) up to 32K bytes of dynamic RAM, a serial 
communications interface, three programmable parallel I/O 
ports, three programmable timers, priority interrupt control. 
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Figure 3.2 The topology of a single star. 
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Figure 3.3 Architecture of the Intel Single Board Computer 8612 



Multibus interface control logic, and bus expansion drivers 
for interface with other Multibus interface-compatible expan- 
sion boards. Also included is dual port control logic to 
allow the iSBC 8612A board to act as a slave RAM device to 
other Multibus interface masters in the system. Provision 
is made for user installation of up to 16K bytes of read 
only memory. 

The iSBC 8612A Single Board Computer is 
controlled by an Intel 8086 16 bit microprocessor (CPU) . 

The 8086 CPU includes four 16 bit general purpose registers 
that may also be addressed as eight 8 bit registers. In 
addition, the CPU contains two 16 bit pointer registers and 
two 16 bit index registers. Four 16 bit segment registers 
allow extended addressing to a full megabyte of memory. The 
CPU instruction set supports a wide range of addressing modes 
and data transfer operations, signed and unsigned 8 bit and 
16 bit arithmetic including hardware multiply and divide, and 
logical and string operations. The CPU architecture features 
dynamic code relocation, reentrant code, and instruction look- 
ahead . 

The iSBC 8612A board has an internal bus for 
all on-board memory and I/O operations and accesses the system 
bus (Multibus interface) for all external memory and I/O oper- 
ations. Hence, local (on-board) operations do not involve 
the Multibus interface making the Multibus interface avail- 
able for true parallel processing when several bus masters 
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(e.g,, DMA devices and other single board computers) are 
used in a multimaster scheme. 

Dual port control logic is included to 
interface the dynamic RAM with the Multibus interface so 
that the iSBC 8612A board can function as a slave RAM device 
when not in control of the Multibus interface. The CPU has 
priority when accessing on-board RAM. After the CPU com- 
pletes its read or write operation, the controlling bus mas- 
ter is allowed to access RAM and complete its operation. 

Where both the CPU and the controlling bus master have the 
need to write or read several bytes or words to or from on- 
board RAM, their operations are interleaved. For CPU access, 
the on-board RAM addresses are assigned from the bottom up 
of the 1 megabyte address space; i.e., 00000- 07FFFj^. The 
slave RAM address decode logic includes jumpers and switches 
to allow positioning the on-board RAM into any 128K segment 
of the 1 megabyte system address space. 

The slave RAl'I can be configured to allow 
either 8K, 16K, 24K, or 32K access by another bus master. 

If the iSBC 300 Multimodule RAM option is installed, the 
memory increments are 16K, 32K, 48K, or 64K. Thus, the RAM 
can be configured to allow other bus masters to access a 
segment of the on-board RAM and still reserve another segment 
strictly for on-board use. The addressing scheme accommodates 
both 16 bit and 20 bit addressing. 
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Four IC sockets are included to accommodate 
up to 16K by'tes of user- ins tailed read only memory. Config- 
uration jumpers allow read only memory to be installed in 2K, 
4K, or 8K increments. 

The iSBC 8612A board includes 24 program- 
mable parallel I/O lines implemented by means of an Intel 
8255A Programmable Peripheral Interface (PPI) . The system 
software is used to configure the I/O lines in any combina- 
tion of unidirectional input/output and bidirectional ports. 

The I/O interface may be customized to meet specific periph- 
eral requirements and, in order to take full advantage of the 
large number of possible I/O configurations, IC sockets are 
provided for interchangeable I/O line drivers and terminators. 
Hence, the flexibility of the parallel I/O interface is fur- 
ther enhanced by the capability of selecting the appropriate 
combination of optional line drivers and terminators to pro- 
vide the required sink current, polarity, and drive/termination 
characteristics for each application. The 24 programmable 
I/O lines and signal ground lines are brought out to a 50 pin 
edge connector (Jl) that mates with flat, woven, or round 
cable. 

The RS232C compatible serial I/O port is 
controlled and interfaced by an Intel 8251A USART (Universal 
Synchronous/Asynchronous Receiver/Transmitter) chip. The 
USART is individually programmable for operation in most 
synchronous or asynchronous serial data transmission formats 
(including IBM Bi-Sync) . 
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In the synchronous mode the following are 

programmable : 

a. Character length, 

b. Sync character (or characters) , and 

c. Parity. 

In the asynchronous mode the following are 

programmable : 

a. Character length, 

b. Baud rate factor (clock divide ratios of 1, 16, or 64), 

c. Stop bits, and 

d. Parity. 

In both the synchronous and asynchronous 
modes, the serial I/O port features half- or full-duplex, 
double buffered transmit and receive capability. In addi- 
tion, USART error detection circuits can check for parity, 
overrun, and framing errors. The USART transmit and receive 
clock rates are supplied by a programmable baud rate/time 
generator. These clocks may optionally be supplied from an 
external source. The RS232C command lines, serial data lines, 
and signal ground lines are brought out to a 50 pin edge con- 
nector (J2) that mates with flat or round cable. 

Three independent, fully programmable 16 bit 
interval timer/event counters are provided by an Intel 8253 
Programmable Interval Timer (PIT) . Each counter is capable 
of operating in either BCD or binary modes; two of these 
counters are available to the system’s designer to generate 
accurate time intervals under software control. Routing for 
the outputs and gate/trigger inputs of two of these counters 
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may be independently routed to the 8259A Programmable Inter- 
rupt Controller (PIC). The gate/trigger inputs of the two 
counters may be routed to I/O terminators associated with 
the 8255A PPI or as input connections from the 8255A PPI. 

The third counter is used as a programmable baud rate gener- 
ator for the serial I/O port. In utilizing the iSBC 8612A 
board, the systems designer simply configures, via software, 
each counter independently to meet system requirements. 
Whenever a given time delay or count is needed, software 
commands to the 8253 PIT to select the desired function. 

The contents of each counter may be read at any time during 
system operation with simple operations for event counting 
applications, and special commands are included so that the 
contents of each counter can be read "on the fly.” 

The iSBC 8612A board provides vectoring for 
bus vectored (BV) and non-bus vectored (NBV) interrupts. An 
on-board Intel 8259A Programmable Interrupt Controller (PIC) 
handles up to eight NBV interrupts. By using external PICs 
slaved to the on-board PIC (master) , the interrupt structure 
can be expanded to handle and resolve the priority of up to 
64 BV sources. 

The PIC, which can be programmed to respond 
to edge-sensitive or level-sensitive inputs, treats each 
"true” input signal condition as an interrupt request. After 
resolving the interrupt priority, the PIC issues a single 
interrupt request to the CPU. Interrupt priorities are 
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independently programmable under software control. The 
programmable interrupt priority modes are; 

(a) Nested Priority. Each interrupt 
request has a fixed priority: input 0 is highest, input 7 

is lowest, 

Cb) Fully Nested Priority. This mode is 
the same as the nested mode, except that when a slave PIC is 
being serviced, it is not locked out from the master PIC 
priority logic and when exiting from the interrupt service 
routine, the software must check for pending interrupts from 
the slave PIC just serviced. 

(c) Auto-Rotating Priority. Each interrupt 

request has equal priority. Each level, after receiving 

• « 

service, becomes the lowest priority level until the next 
interrupt occurs, 

(d) Specific Priority. Software assigns 
lowest priority. Priority of all other levels is in numer- 
ical sequence based on lowest priority. 

(e) Special Mask. Interrupts at the level 
being serviced are inhibited, but all other levels of inter- 
rupts (higher and lower) are enabled. 

(^f) Poll. The CPU internal interrupt 
enable is disabled. Interrupt service is achieved by pro- 
grammer initiative using a Poll command. 

The CPU includes a non-maskable interrupt 
(NMI) and a maskable interrupt (INTR) . The NMI interrupt is 
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intended to be used for catastrophic events such as power 
outages that require immediate action of the CPU. The INTR 
interrupt is driven by the 8259A PIC which, on demand, pro- 
vides an 8 bit identifier of the interrupting source. The 
CPU multiplies the 8 bit identifier by four to derive a 
pointer to the service routine for the interrupting device. 

Interrupt requests may originate from 18 
sources ^without the necessity of external hardware. Two 
jumper-selectable interrupt requests can be automatically 
generated by the Programmable Peripheral Interface (PPI) when 
a byte of information is ready to be transferred to the 8086 
CPU (i.e., input buffer is full) or a byte of information has 
been transferred to a peripheral device (i.e., output buffer 
is empty). Two j umper- selectable interrupt requests can be 
automatically generated by the USART when a character is 
ready to be transferred to the 8086 CPU (i.e., receive channel 
buffer is full) or when a character is ready to be transmitted 
(i.e., transmit channel data buffer is empty). A jumper- 
selectable interrupt request can be generated by two of the 
programmable counters and eight additional interrupt request 
lines are available to the user for direct interfaces to 
user designated peripheral devices via the Multibus interface. 
One interrupt request line may be jumper routed directly from 
a peripheral via the parallel I/O driver/terminator section 
and one power fail interrupt may be input via auxiliary 
connector P2. 
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The iSBC 8612A board includes the resources 
for supporting a variety of original equipment manufacturer 
system requirements. For those applications requiring addi- 
tional processing capacity and the benefits of multiprocessing 
(i.e., several CPUs and/or controllers logically sharing 
systems tasks with communication over the Multibus interface) , 
the iSBC 8612A board provides full bus arbitration control 
logic. This control logic allows up to three bus masters 
(e.g., combination of iSBC 8612A board, DMA controller, 
diskette controller, etc.) to share the Multibus interface 
in serial (daisy-chain) fashion or up to 16 bus masters to 
share the Multibus interface using an external parallel pri- 
ority resolving network. 

The Multibus interface arbitration logic 
operates synchronously with the bus clock, which is derived 
either from the iSBC 8612A board or can be optionally gen- 
erated by some other bus master. Data, however, are trans- 
ferred via a handshake between the controlling master and the 
addressed slave module. This arrangement allows different 
speed controllers to share resources on the same bus, and 
transfers via the bus proceed asynchronously. Thus, the 
transfer speed is dependent on transmitting and receiving 
devices only. This design prevents slower master modules 
from being handicapped in their attempts to gain control of 
the bus, but does not restrict the speed at which faster 
modules can transfer data via the same bus. The most obvious 
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applications for the master-slave capabilities of the bus 
are multiprocessor configurations, high speed direct memory 
access (DMA) operations, and high speed peripheral control, 
but are by no means limited to these three. 

Adding the optional iSBC 300 Multimodule 
RAM to the iSBC 8612A board, allows the on-board RAM to be 
expanded by 32K (for an on-board total of 64K) . If the 
optional iSBC 340 Multimodule EPROM is installed on the iSBC 
8612A board, the amount of on-board ROM/EPROM can be expanded 
by 16K (for an on-board total of 32K) . 

b. Special Processing Elements 

Special purpose processing elements will also 
be used in this system to enhance processing capabilities. 
Typical examples are array processors, FFT, correlators, etc. 
However, they have not been included in this thesis project. 

c. Memories 

Three types of memories are provided. 

(1) Secondary Memory . It consists of two mag- 
netic cartridge hard discs and a dual drive floppy diskette 
system. The magnetic hard disc is manufactured by the DYNEX 
Company and has a storage capacity of 10 megabytes. This 
hard disc system is connected to the system Multibus, thus 
allows fast data transfer rate and has DMA capability. Its 
interface to the Multibus is made by the Interphase Corp. 

The dual floppy diskette drive is a part of the Intel MDS-220 
development system. 
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(2) Primary Memory . It consists of dynamic 



RAM and EPROM (Erasable Programmable Read Only Memory) . The 
EPROMs reside in each SBC (8K byte to 16K byte per SBC). It 
can be used as the monitor storage, and to store part of the 
operating system. The RAMs reside in two types of physical 
locations. The first location is on each SBC and has a 
capacity up to 64K bytes. The second type of location is on 
separate RAM boards. A 128K byte RAM board developed by the 
MUPRO Company is used. The RAM in the SBC is a dual ported 
RAM which can be shared with other SBCs via the Multibus 
interface. Part or all of the dual ported RAM can be made 
accessible only to the on-board CPU; in other words, made 
"private” and "unshared" to the SBC. The stand-alone RAM 
boards are shared with other SBCs via the Multibus interface, 
d. Memory Hierarchy 

The primary memory of this type is partitioned 
according to the following hierarchical scheme. 

A) Private Unshared Memory - RAMs available on each SBC 
which can be accessed only by the on-board CPU. 

B) Internal Global Shared Memory - Internal global 
shared RAM available on each SBC and special RAM 
boards. The on-board RAM in the SBC is a dual 
ported RAM and can be accessed by any SBC which 
is a member of that cluster (unaccessible to PE 
in other clusters). See Section C.S.a.l. 

C) External Global Shared Memory - External global 
shared RAMs reside in special RAM boards and/or in 
dual ported RAM of the SBCs. These memories can be 
accessed by any SBCs in the same "star," and any 
SBCs in the corresponding clusters in neighboring 
stars . 
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Using this memory hierarchy, the total address 

space can be expanded from the physical memory address space 

of each CPU, The 8086 microprocessor has 20 address lines so 

20 

its physical address space is C2 ) = 1,048,576 bytes, or 

IM bytes. 

In this implementation, the total address space 
(memory space) for a single star is partitioned in the follow- 
ing way: 



(1) Private Memory 
6 yC in each cluster 

2 • 65,536 + 4 • (65,536 - 8,192 
= 360,448 bytes/cluster 
2 . 64K + 4 • (64K - 8K) 

= 352 ;Kbytes/cluster^ 

(2) Internal Global 

6 yC/CL 
1 M bytes • 

= 768K byte/cluster 
= 786,432 bytes/cluster 



8 yC in each cluster 
4 . 64K + 4 . (64K - 8K) 

= 480K bytes/cluster 
= 491,520 bytes/cluster 

Memory 
8 yC/CL 
1 M bytes • ^ 

= 768K byte/cluster 
= 786,432 bytes/cluster 



(3) External Global Memory 
6 yC/CL 8 yC/CL 

32K byte/cluster 32K bytes/cluster 

= 32,768 bytes/cluster = 32,768 bytes/cluster 

As described before, a "star” consists of four clusters, 
thus the total memory space for a single star is: 

^1 K bytes = 1024 bytes. 
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6 uC/CL 

4 • C352K + 768K + 32K) 
= 4,608K bytes/star 
= 4,718,592 bytes/star 



8 uC/CL 

4 • (480K + 768K + 32K) 
= 5,120K bytes/star 
= 5,242,880 bytes/star 



This expanded memory space can be determined in general as: 
MS = Memory space 

CL = Number of clusters in a "star" 

PM = Private memory. In K bytes. 

GIM = Global internal memory. In K bytes. 

GEM = Global external memory. In K bytes. 

N = Number of SBCs. 

N 

MS = CL • z PM. + GIM + GEM (3.0) 

i=l ^ 



If all SBCs are assigned the same amount of private memory, 
then (3.0) becomes 

MS = CL • (N • PM + GIM + GEM) (3.1) 



The reason for computing the memory space for 6 microcom- 
puters and for 8 microcomputers in a cluster is mainly 
because of power supply considerations. The available power 
supply can handle up to 6 SBCs in a cluster. However, the 
controller for intercommunication is designed for 8 SBCs. 

4 . Intercommunication Network 

In order to establish fast, reliable and high 
of fault toleran communication among SBCs of different 
clusters and stars, three level communication controllers 
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were designed, built and tested. They include a combination 
of random priority, distributed, and central controllers as 
shown in Fig. 3,4 for a single star. Each cluster has its 
own distributed controller. Each star has four such control- 
lers. The four clusters share one central controller. The 
four distributed controllers are identical, and have some 
degree of programmability.. 

a. Distributed Controllers (DC) 

A block diagram of the distributed controller is 
depicted in Fig. 3.5. It resides on a single board located 
in each cluster. Its primary functions are the following: 

1) Arbitration among Internal/External bus requests 
from within and outside the cluster. 

2) Priority resolving. 

3) Inter-cluster advance activities monitoring. 

4) Interacting with the central controller. 

5) Deadlock avoidance. 

b. Random Priority Controller (RPC) 

The RPC is a bus contention resolver based on 
a binary tree approach. The RPC accepts up to eight ’’Bus 
Requests" (BREQ) and issues a single "Bus Priority In" (BPRN) 
signal. BREQ is a signal generated by the bus arbiter which 
resides on-board the SBC to indicate that this particular SBC 
requires the control of the cluster system bus (Multibus) for 
one or more data transfers. BPRN is a signal generated by 
the RPC to indicate to the requesting SBC that control of the 
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Figure 3.4 Diagram of a Three Level Control for a Four Clusters 
Multiple Microcomputer System 
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Figure 3.5 Schematic Diagram of Distributed Controller 




cluster bus is granted. Prior to issuance of a BPRN, the 
RPC generates an "advanced bus priority in" signal (intra- 
cluster advance activities monitor BPRN*) which is sent to 
the ICAAM as a "port selector" signal. This signal starts 
a chain of logical activities which eventually causes the DAC 
(deadlock avoidance circuit) to send two signals, i.e., BHD 
(bus hold) and PRE (priority enable) to the RPC. When the 
appropriate BHD and PRE are received by the RPC, it will 
generate the BPRN signal. BHD is a positive logic signal 
which enables the tristate output of the RPC to allow BPRN* 
to propagate and become a BPRN signal, when the PRE signal 
is enabled. If BHD goes low, it disables all PRN*. PRE is 
a negative logic signal which is generated in the DAC circuit. 
When the PRE signal is generated, it disables requests from 
other clusters and enables the output driver of the RPC to 
send the BPRN. 

The RPC has an internal clock to synchronize its 
arbitration function. More details can be found in Section 

C.4.b. 

ICAAM (Intra-Cluster Advance Activities Monitor) 
has a multiplexer which selects two signals, MSBT (most 
significant address bits, 5 bits out of 20) and ADRDC/ADWTC 
(advance read command/advance write command) when a BPRN* 
is received from the RPC. By analysing the MSBT, the ICAAM 
generates a bus request of one of the following types; 
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1) Intra-cluster bus request. It is a request for the 
system bus in the same cluster only. In response 
to this request, the ICAAI*! generates a IREQ signal. 

2) Inter-cluster bus request. It is one out of four 
cluster requests generated by the ICAAM of the 
distributed controller. Each CLREQ* requests 
three resources; one system bus of the requesting 
cluster, one system bus of the requested cluster 
and one inter-connecting bus switch. Following a 
CLREQ*, the ICAAM also creates an EXREQ for the CIC 
(coincidence inhibit circuit) . 

3) Inter-star bus request. This request, labeled 

STREQ* , involves three resources: the system bus 

of a cluster in the requesting star, the system 
bus of the corresponding cluster in the requested 
star, and the inter-connecting bus switch between 
these two stars. Following a STREQ* signal, the 

. ICAAM also creates an EXREQ for the CIC. 



The ICAAM also generates an advanced read command 
(ADRDC) or advance write command (ADWTC) before the corre- 
sponding read command (MRDC) or write command (MWTC) is 
generated by the bus controller of the requesting SBC. This 
is done by monitoring the activities of the CPU of the re- 
questing SBC before the CPU grants the system bus. Those 
signals are needed to determine the direction of the drivers 
in the bus switch in advance, so that all switching transients 
are settled before a data transfer takes place. 

CIC (Coincidence Inhibiter Circuit) - The CIC 
accepts five signals as inputs: one STPRN (star priority in), 

three (cluster priority in) from the central controller and 
one IREQ/EXREQ from ICAAM. It generates one output signal 
INH (inhibit) for the DAC (deadlock avoidance circuit) . The 
primary function of the CIC is to inhibit a BPRN from the RPC 
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in case that a CLREQ* or STREQ* were issued by the ICAAM, 
until either a CLPRN or a STPRN is granted by the central 
controller to the CIC, The necessity of this signal INH 
is to prevent the system bus to be tied down in waiting until 
the inter-cluster request is granted and allow efficient bus 
usage and reduce bus contention. 

PAG (Deadlock Avoidance Circuit^ . A "deadlock” 
is a situation in which two processes are unknowingly wait- 
ing for resources that are held by each other and thus un- 
available [192]. More details can be found in Section C.5.d.,e. 
The primary function of the DAG is to prevent deadlock. Its 
principle is similar to the "Suspend" Lock method [Ref. 193]. 

The DAG accepts four input signals: ANREQ (any request), 

INH, STREQ, GLREQ and generates three signals; BHD (bus 
hold) , PRE (priority enable) and GL/STPRN. Three cases will 
be described to explain the operations of DAG depending on 
the occurrence of either the GLREQ (or STREQ) and the INH 
signals . 

(Gase 1) - A GLREQ (or STREQ) occurs prior to 
the INH signal, the GL/STPRN signal will be granted. In this 
case, BHD will go low and PRE high, thus freezing the selected 
request in the RPG, disabling the BPRN* which will release 
all the resources held by the appropriate SBG via the BPRN* 
signal (IGAAM, GGU-I). About 30 nsec later, a GL/STPRN 
will be generated by the DAG. This allows the appropriate 
processing element to grant the system bus. 
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CCase 2) - A CLREQ (or STREQ) signal occurs 
after the INH signal, the CL/STPRN signal will be blocked. 

It indicates that the system bus is in use. In this case, 

BHD is high and PRE goes low, BPRN will be granted. 

(Case 3) - If the INH signal and CLREQ (STREQ) 
signal occur simultaneously within a time window of 15 nsec, 
the CLREQ (or STREQ) signal will be blocked as before. In 
case of any occurrence of a transient CL/STPRN signal, the 
"GLITCH KILLER" will suppress it and prevent the transient 
from propagating to the central controller, 
c. Central Controller (CC) 

The central controller is a single board control- 
ler, which consists of two clocks and four identical units, 
each corresponding to one cluster in the star. The primary 
functions of the CC are: 

1) To arbitrate among different CLREQ and STREQ to a 
single cluster. 

2) Enable and disable the CL/STPRN signal chain. 

3) Enable and disable the appropriate bus switch links 
of the complete star switch. 

A block diagram of the CC is presented in Fig. 3.6. 

CLK-1 - Clock 1 is the main clock of the central 
controller. Its frequency is 30 MHZ. It is used to synchro- 
nize and enable the arbitration function of the CSRA (cluster/ 
star request arbiter) and the four-phase clock, CLK-2. 

CLK- 2 - Clock 2 is a four-phase, anti-coincidence 
clock. Its input is CLK-1 which generates four clocks, one 
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Figure 3.6 



A block diagram of the central controller. 
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each for four CSRAs. The functions of the four-phase clock 
are ; 

1) To synchronize the CLREQ (or STREQ) chain action via 
the CSRA in order to prevent deadlocks. The deadlock 
avoidance method used in this implementation is similar 
to the "spinning lock" method [192]. The spinning 
lock is rotating at a frequency of 3.75 MHZ (30/8 MHZ). 

CSRA (Cluster/Star Request Arbiter) - The CSRA 

is a rotating priority resolver. Its primary functions are: 

1) To arbitrate among requests from three other clusters 
within the same star and from the corresponding 
cluster in the neighboring star. 

2) To enable the selected request, after being synchro- 
nized with the spinning lock, to propagate to the 
requested cluster. 

The CSRA accepts four different requests to a single cluster 
and grants one of them according to a rotating priority scheme 
CSPE (Clus ter/Star Priority In Enable) - the CSPE 
is a demultiplexer whose primary function is to enable the 
CL/STPRN chain action. The CSPE is synchronized by the CSRA. 
When a CLPRN is received from the requested cluster, the CSPE 
will enable the CLPRN chain action to the selected requesting 
cluster. 



SSEC (Star Switch Enable Circuit) - The SSEC 
consists of a set of six drivers. It accepts the different 
CLPRNs and generates two signals, ECC, DIR, DIR. ECC is a 
negative logic signal which enables one of the bus switch 
links corresponding to the CLPRN signal. DIR is a signal 
which sets the requesting direction of the drivers in the 
selected link of the "complete star" bus switch. DIR is 
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the inverted DIR signal. The SSEC is responsible for the 
enabling of the six different links of the complete star bus 
switch as depicted in Fig. 3,7. 

5. Intercommunication Procedures Among Resources 

Communication among the resources of this system is 
governed by the following basic concepts: Explicitly seg- 

mented memory; unshared local and shared global internal/ 
external memory hierarchy, asynchronous process structure and 
a design decision that each single board computer is allowed 
to use the system bus for transfer of only one word of data 
and then must release the system bus to other SBCs except 
when a prefix lock is executed by software. A software lock 
will grant the bus to that SBC for any length of time needed 
by that SBC. In general, this feature is not required fre- 
quently so the operating system will not normally be delayed 
waiting for the system bus to be released in order to test a 
semaphore, or any other synchronization primitives. 

In order to provide effective communication among all 
processing elements (within a single cluster, among different 
clusters in a single "star," and among "stars") and to arbi- 
trate the contention of bus usage (in star bus switch and 
inter-star bus switches), we have developed an intercommuni- 
cations system managed by distributed and central controllers, 
as described in Chapter III. D. 4., 5. 

In order to describe the communication protocol among 
different SBCs, a two "star" system is chosen - STAR-1, STAR- 2 
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Figure 3.7 Diagram for the "Complete Star" Bus Switch Network 
(Example; 4 Clusters, 6 Bus Switches) 



as depicted in Fig. 3.8. Several examples of different 
types of communication are presented. 

a. Example #1 - Intra-Cluster Communication 

Intra-cluster communication is accomplished by 
means of data transfer via the cluster Multibus. This type 
of communication does not involve the central controller or 
any bus switch. The distributed controller resident in the 
specific cluster and on-board SBCs are the controllers of 
this communication link. 

For example, let us assume SBC-1 in cluster A1 
requests some information from SBC-2 in the same cluster. 
The sequence of events (Fig. 3.9) is: 

a) SBC-1 generates BREQ signal. 

b) The RPC of the distributed controller will grant 
the request and generates a BPRN* signal. 

c) The ICAAM of the distributed controller will 
generate an IREQ signal, for the inhibiter. 

d) From the IREQ, the "IHC generates an inhibit 
signal which causes the DAC to send appropriate 
BHD and PRE signals. 

e) These two signals are sent to the RPC to close the 
chain and a BPRN is generated. 

f) The BPRN signal is applied to the arbiter circuit 
of the corresponding SBC. From this point, a 
regular Multibus transfer is executed. 

• 

These six events are necessary to establish any 
intra-cluster communication. But they are not sufficient. 
The following conditions corresponding to the requests from 
other clusters and stars must be examined: 
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Figure 3.8 Diagram Showing Inter-Star and Intra-Star Interconnections 
Using Bus Switches 

ISBSW ; Inter-Star Bus Switch 
SBSW : Star Bus Switch (Intra-Star) 
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(Intra-Star) 



1) Is there any other cluster in process of communica- 
tion with this cluster? 

2) Is there any other star in process of communication 
with this cluster? 

For simplicity of this example, we assumed that 
no external requests were involved in the process of intra- 
cluster communication. 

Upon termination of the data transfer via the 
system bus, SBC-1 releases its BREQ signal which releases 
all sources held by SBC-1. The average time of word transfer 
is 1.65 jisec. 

b. Example 2 - Inter-Cluster Communication 

(within a Star) 

Inter-cluster communication is accomplished by 
means of data transfer via two clusters* system buses (Multi- 
bus) and the bus switch interconnecting those two clusters. 
This type of communication involves all controllers, the star 
bus switch, and the on-board SBC arbiter. (See Fig. 3.10). 

Assume that SBC-1 in cluster A1 requests some 
information from SBC-1 in cluster Bl. The sequence of events 
is : 

1) SBC-1 of A1 generates BREQ signal, 

2) The RPC of the distributed controller in cluster A1 
locks on the request and generates a BPRN* signal. 

3) The BPRN* signal is applied to the ICAAM of the 
distributed controller. 

4) The ICAAM generates two signals: CLREQ-Bl, which 

propagates to the rotating priority arbiter of the 
central controller unit B and *'EXREQ" which is 
applied to the *'CIC'* coincidence inhibiter of the 
distributed controller of cluster Al. 
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Figure 3.10 State Diagram of Inter- Cluster Communicati 
(Intra-Star) 



5) The "CIC" coincidence inhibiter generates an appro- 
priate INH signal which will cause the distributed 
controller in cluster A to wait for a CLPRN from 
the demultiplexer of the central controller, unit B. 

6) The '’cluster/star request arbiter” in the central 
controller locks on the CLREQ-Bl signal and waits 
for the spinning lock to enable the CLREQ chain 
action and locks on the request. 

7) The CLREQ signal is applied to the DAC of the dis- 
tributed controller of cluster Bl. 

8) The DAC of the distributed controller of cluster Bl 
generates a CLPRN signal which is applied to the 
demultiplexer of unit B of the central controller. 

9) The central controller enables the CLPRN signal to 
the "DAC” of the distributed controller in cluster 
A which generates appropriate BHD and PRE signals. 

10) The BHD and PRE signals are applied to the ROC and 
closes the chain action. The RPC then generates 
the BPRN signal. 

11) The BPRN signal is applied to the on-board SBC-1 
arbiter which starts the regular Multibus communi- 
cation. 

12) After the event #9, a parallel process is initialized. 
This process is the bus switch enable. Two signals, 

DIR and ECC, are sent to the bus switch which links 
the buses of cluster A1 and cluster Bl. 

13) Those two signals prepare the switch for the coming 
data transfer. 

The initialization of the bus switch terminates 
200 nsec before the transfer of data via the bus (switch) . 
This feature makes the bus switch transparent to the request- 
ing cluster, and both clusters are linked on a longer system 
bus for the time the transfer takes place. SBC 1 in cluster 
A1 can use the "longer” system bus (two system buses and the 
plus switch) for more than one word transfer, if this feature 
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is requested by a software bus lock instruction from SBC 1. 
Termination of this process is started by releasing the BREQ 
signal by SBC-1 of cluster Al, This event releases all 
resources held by SBC 1 of cluster Al. 

The sequence of events described in this example 
is necessary for this type of communication. Other external 
events were not introduced in order to simplify the example. 
This sequence of events takes place in an average time of 
2.1 ysec. 

c. Example #3 - Inter-Star Communication 

Inter-star communication is accomplished by means 
of data transfer via the system buses of two clusters and the 
bus switch interconnecting these two clusters. This type of 
communication involves all controllers, and the bus switch 
interconnecting the two clusters. The sequence of events is 
similar to the previous example. Instead of the CLREQ signal, 
a STREQ signal is applied to the central controller. The 
responding signal is STPRInI. (See Fig. 3.11). 

Examples 1, 2, and 3 described a case of separable 
communication levels. In a real application, the situation 
can be more complicated. For example, a simultaneous com- 
bination of the three different examples is possible. In 
such a case, deadlocks could occur frequently [193] . In 
order to prevent those deadlocks, two methods of deadlock 
avoidance are used. 
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Figure 3.11 State Diagram of Inter-Star Communication 



"Suspend Lock" - This method is implemented in 
the DAC of the distributed controller. In order to explain 
how this method works, the following example is used. 

d. Example #4 - Deadlock Avoidance I - 

Suspend Lock 

SBC-i in cluster A1 of star 1 requests SBC-j in 
cluster A2 of star 2 (process PI, and SBC-k in cluster A2 of 
star-2 requests SBC-Jl in cluster A1 of star 1 (process P2) , 
^8>.i, j ,k,£>^l} . Let's assume that in time the two request 
processes PI and P2 progress to state No. 3 (Fig. 3.12). 

At this point of execution, the processes PI, P2 are holding 
the following resources; 

PI: (RPC-DC-Al, ICAAM-DC-Al, CSRA/CCBl, DAC-Al, CIC-Al} 

P2; {RPC-DC-A2, ICAAM-DC-A2, CSRA/CCA2, DAC-A2, CIC-A2} 

At this point of execution, each process requests the DAC 
located in the other distributed controller. But the two 
DACs are held by the requesting processes and are unavailable 
It seems that we have a deadly embrace situation (deadlock) . 

The DAC is designed to avoid such a case. One 
of the DAC (which will be called the first DAC depending upon 
the time of arrival of the requests) will suspend the lock 
of the second DAC, by releasing some of the resources that 
are held by the second requesting process. This way the 
first requesting process will be advanced while the second 
will be suspended and wait for the first process to terminate 
This deadlock could happen if the suspend lock method is not 



206 



used when the two requesting clusters are located in differ- 
ent stars because the two spinning locks of the two central 
controllers are not synchronized. Therefore, the spinning 
lock function is limited for inter-star communication. This 
is the reason for having two types of deadlock avoidance 
methods. The suspend lock method is used to prevent dead- 
lock for inter- star communication. The issue of synchronizing 
the spinning locks of the different central controllers of a 
multi-star system is not desirable for fault tolerance, and 
sometimes it may not be possible to synchronize them. 

The second method of deadlock avoidance is the 
"spinning lock" method. This method is used to prevent 
deadlocks which may occur in inter-cluster or intra-cluster 
communication within the same star. If for any reason th*is 
method fails to prevent a deadlock, the "suspend lock" method 
will take over and prevent the deadlock. The reason for 
using two different methods is to reduce the overhead created 
by the suspend method and to increase fault tolerance. 

CLK-2 in the central controller is a four-phase 
anti-coincidence clock as shown in Fig. 3.22. This clock is 
the "spinning lock" generator. 

e. Example #5 - Deadlock Avoidance II - 

Spinning Lock ' (Fig. 3.12) 

Let us assume that SBC-i in cluster A requests 
SBC-j in cluster B and SBC-k in cluster B requests SBC-£ in 
cluster A. These requests are all for SBCs residing in the 
same "star." If the two requests are sent simultaneously to 
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Figure 3.12 State Diagram for A Deadlock Example 



the CSRA of CCA and CSRA of CCB, respectively, of the central 
controller, they eventually will progress to the deadlock 
condition as explained in Example #4. In order to prevent 
such possibility, the CSRA of the central controller is 
designed with two "lock in request" phases, 

1) The first phase is implemented by the rotating 
priority arbiter. 

2) The request selected by the first arbiter propagates 
to the "spinning lock" circuit which will lock on 
the request only when CLK-2 goes low, 

CLK has four phases. Since only one goes low at any given 
time, it is impossible for both requests to leave the central 
controller at the same time to the distributed controller of 
the requested cluster and thus eliminates the race condition 
and deadlock, A race condition occurs when the scheduling 
of two processes is so critical that the various orders of 
scheduling them result in different processing [192] . The 
minimum time difference caused by the spinning lock to the 
requesting process is equal to the anti-coincidence time t^^, 
of CLK-2 (Fig. 3.22) . 

6 . Multibus Communication 

Two arbitration circuits are used in the Multibus 
communication: the on-board SBC arbiter called Bus Arbiter 

and the RPC of the distributed controller. 

The Bus Arbiter provides several resolving techniques 
based on a priority concept that at a given time one SBC will 
have priority above all the rest. The RPC can be regarded as 
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a parallel priority resolver. A parallel priority resolving 
technique has a separate bus request BREQ line for each arb- 
iter on the system bus Q^ultibus) , Several BREQ lines enter 
to the RPC input. For each BREQ line, there is a correspond- 
ing BPRN (bus priority in) line at the output of the RPC. 

Only one BPRN signal can be activated at any given time. 

This signal BPRN is returned to the highest priority request- 
ing bus arbiter. The bus arbiter receiving priority (BPRN 
active low) then allows its associated SBC onto the multi- 
master system bus, as soon as the bus becomes available (i.e., 
it is no longer busy) . When one bus arbiter gains priority 
over another arbiter, it cannot immediately seize the bus. 

It must wait until the present bus occupant completes its 
transfer cycle. Upon completing its transfer cycle, the 
present bus occupant recognizes that it no longer has priority 
(BPRN goes high) and surrenders the bus, releasing the Busy 
signal. Busy is an ’’active low" signal line which goes to 
every bus arbiter on the system bus and is tied with other busy 
signals by a "OR" gate. When the "Busy" goes high, the 
arbiter which presently has bus priority (BPRN active low) 
then seizes the bus and pulls "Busy" low to keep other arb- 
iters off. the bus. (See waveform timing diagram. Fig. 3.13.) 
Note that all multi-master system bus transactions are syn- 
chronized to the bus clock (BCLK) . This gives to the parallel 
priority resolving circuit time to settle and make a correct 
decision. Fig. 3.14 depicts the interconnections between the 
bus arbiters and the RPC. 
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Fig. 3.13 Timing Diagram o£ Bus Arbiter and Random Priority 
Controller 




Fig. 3.14 Interconnection of Random Priority Controller 
and Bus Arbiters 
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In our configuration, every master currently using 
the bus will surrender the bus upon completing its transfer 
cycle Cunless a bus lock is executed) . This property is 
accomplished by tying all CBREQ (common bus request) lines o 
of all bus arbiters to ground. CBREQ is an active low signal 
which indicates to the current master on the bus that the bus 
has been requested by another master. 

Two other signals, LOCK and CRQLCK, lend to the flex- 
ibility of the bus arbiter within the system configuration. 
LOCK is a signal generated by the processor to prevent the 
bus arbiter from surrendering the multi-master system bus to 
any other master, either higher or lower priority. CRQLCK 
(common request lock) serves to prevent the bus arbiter from 
surrendering the bus to a lower priority bus master when con- 
ditions warrant it. LOCK is used for implementing software 
semaphores for critical code section and real time critical 
events (such as memory refresh or hard disc transfer) . 

In the three different types of communications we 
referred to the term PRN and REQ chains. The following state 
diagrams depict those chains: 

1) Intra-cluster communications 

BREQ 
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Inter-cluster communications 
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3) Inter- star communication 
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D. PRESENTATION OF RESULTS 
1 . Introduction 

The important hardware components developed in this 
thesis to support this multiple microcomputer system are the 
following : 

Interconnection: 

Intra-cluster -- Multibus 

Inter-cluster -- Complete-Star Bus Switch Network 
Intercommunication Control (three levels) : 
Random-Priority Controller 
Distributed Controller 
Central Controller 

In this section, we will present representative test 
results to answer two major questions. 



213 



1) Did our design work? 

2) How well did it work? 

Since the Multibus is developed by Intel and is well docu- 
mented [196], we decided not to report its operations here. 
We will describe the operational results of the bus switch 
and the three levels of intercommunication control. 

How well they work together in a computational 
environment will be reported in Chapter IV where the imple- 
mentation of an adaptive spatial filter on the multiple 
microcomputer system will be described. 

2 . Bus Switches 

The function of a bus switch is to transmit a signal 
from the Multibus in one cluster to the Multibus in another 
cflUster. For four clusters, the "complete star bus switch 
network" designed has six branches of bus switches as shown 
in Fig. 3.7. Although the Intel's Multibus has 86 lines, 
we decided that only 58 of them need to be switched to 
facilitate communication between two SBCs from different 
clusters. Therefore, one "bus switch" includes appropriate 
circuits to transmit 58 signals, including data, address and 
control signals. 

Four figures will be used to describe the behavior 
of the bus switch. The first three figures are used to show 
the improvement of signal waveform before and after the bus 
switch. The signals shown are the following: 
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One data bit - Fig- 3.15a 

One address bit - Fig. 3.15b 
One control signal - Fig. 3.15c 

Each figure consists of two traces. The top trace shows the 
waveform before the switch. The lower trace shows the wave- 
form after the switch. It can be seen that in all three 
cases the waveforms after the switch are better because their 
rise times are all shorter, giving a sharper pulse. It is 
interesting to note the noise appearing on these three signals. 
They are typical in the real operational environment. It 
should be noted that the control signal in Fig. 3.15c is the 
Acknowledge Signal (XACK) generated by the SBC requesting the 
use of the system bus. 

The behavior of the bus switch is described also by 
Fig. 3.20 which shows the delay of the switch. Again, the 
top trace is before the switch, the bottom trace is after the 
switch. The delay is no. more than 25 nsec. 

These four figures demonstrated that our bus switches 
are adequate to provide communication between two Multibuses 
running at 10 MHZ. 

3 . Random Priority Controllers CRPC) 

The function of random priority controllers is to 
arbitrate the requests of bus usage from many SBCs, either 
from the same cluster or from several clusters. If an SBC 
from another cluster wants the Multibus to communicate either 
with another SBC or with the Global RAM, two higher level 
controllers - the central controller and two distributed 
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An Address bit 



OOS 



. V 




Fig. 3.15b 




A control signal 
"Acknowledge" (XACK) 



Fig. 3.15c 



Figure 3.15 The input and output waveforms of three 
selected signals to demonstrate the 
performance of bus switch 

Top trace: Input to the bus switch 
Bottom trace: Output of the bus switch 
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controllers associated with this cluster and the other clus- 
ter where the requesting SBC resides - must also participate 
in the control function. However, the control ultimately 
came to the RPC because it is the circuit which grants the 
bus usage signal, BPRN (Bus Priority In), One RPC is used 
for every Multibus. So there are four RPCs in each star. 

The behavior of our RPC will be described by four 
figures using the BPRN signals (Bus Priority In) of the SBCs 
requesting the bus. A BPRN low signal means the SBC has been 
granted the bus and is using it. 

a. Sharing of the Multibus by Two SBCs. 

Fig. 3,16 shows BPRNs of two SBCs. The bus usage 
pattern was created by software. Each unit of low BPRN rep- 
resents a transfer of one word. If there is no request of 
bus usage by other SBCs, the SBC currently using the bus will 
hold, as shown by the BPRN low signal for a longer period of 
time. The figure shows the interleaving of bus usages by 
these two SBCs, indicating that the RPC works rapidly and 
efficiently to serve these two SBCs. 

b. Slow-Down of Bus Release Due to Refresh 
of Dynamic RAM 

However, we discovered that the SBC using the 
bus may not release the bus after its one word of transfer, 
as shown by a wide gap in Fig, 3.17, although the other SBC 
was requesting the bus. We discovered that this is the na- 
ture of Intel’s 8612 design. When the dynamic RAM is being 
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Figure 3.16 Bus Priority In signals of two SBCs to demonstrate 
the arbitration of their usage of the bus by the 
random priority controller 




Figure 3.17 Bus Priority In signals of two SBCs to demonstrate 
the effect of dynamic RAM refresh on the bus usage 




BPRN 

BPRN 

BPRN 

BPRN 



of SBCl 
of SBC2 
of SBC3 
of SBC4 



Figure 3.18 Bus Priority In signals of four SBCs to demonstrate 
the arbitration of their usage of the bus by the 
random priority controller 
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refreshed, the SBC will not release the bus. This is a 
drawback we cannot do anything about except to redesign the 
8612 SBC. 

c. Sharing of Multibus by Four SBCs 

Fig. 3.21 shows the BPRN signals of four SBCs. 
Their general patterns are similar, in the sense that there 
is no large gap in any one of these traces indicating no SBC 
is dominating the bus and none is being left out either. 

This "uniform" and "equal" treatment of all SBCs requesting 
the bus is exactly what the RPC is designed to do. 

d. Behavior of RPC When the Bus is Saturated 

We prepared the most severe test for the RPC by 
programming four SBCs requesting the bus all the time. Of 
course, in real applications, this condition should never be 
allowed to happen. It represents very poor application pro- 
gramming. However, it is a tough test for the RPC. Fig. 3.19 
shows the BPRN of four SBCs. The interleaving of bus usage 
is no different from the previous three figures. However, 
it is important to note that the bus was first shared by SBCl 
and SBC3 for 12 transfers and then shared by SBC2 and SBC4 
for another 12 transfers, followed by the repetition of such 
a pattern. Two important properties caused this pattern. 
First, the RPC is designed based on a binary tree selection. 
Therefore, only two SBCs will be granted first, followed by 
another pair. Second, the 12 transfers between SBCl and SBC3 
are determined by the basic design of the 8686 instruction 
queue which has a FIFO queue of six instructions. 
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Figure 3.19 Bus Priority In signals of four SBCs which 
request the bus usage 100% of the time to 
demonstrate the function of random priority 
controller 




Input signal to a bus 
switch 



Output signal waveform 
from a bus switch 



Figure 3.20 Waveforms of input and output signal of a 
bus switch to demonstrate the operation 
of the switch 
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Figure 3.21 Bus Priority In signal of four micro- 
computers requesting 20% usage of the 
Multibus to demonstrate the operation 
of the random priority controller in 
this example of heavy bus requests 
(80% bus request) 
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This demonstration clearly indicated that our 
RPC is able to arbitrate four SBCs under the most demanding 
bus contention' situation which should never be allowed to 
occur in real application. 

4 . Central Controller 

The function of the central controller is to arbitrate 
requests for inter-cluster and inter-star communication. It 
works jointly with the distributed controllers to search, 
select and synchronize these requests. Although there is only 
one central controller for a star, it has four sections, one 
for each cluster in the star. 

The important components of each section in the 
central controller are CSRA and CSPE. All four sections are 
synchronized by two clocks: CLKl for the searching and se- 

lecting of requests, CLK2 for their synchronization. 

Two figures will be used to demonstrate their oper- 
ations . 

a. Searching/Selecting Clock (CLKl) and 
Synchronization Clock (CLK2) 

These two clocks are the heart beats of the inter- 
communication network. It should be realized that CLK2 is 
not independent because it is generated from CLKl. Fig. 3.22 
shows their mutual relationship. The third trace is CLKl. 
Below it are the four-phase CLK2 signals for four clusters. 

It is important to note that there is no overlap among them. 
This is to avoid any undesirable coincidence. CLKl is at a 
higher clock frequency such that all requests from other 
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clusters and stars are searched and selected at adequate 
rates. Once a request is selected, it is synchronized by 
CLK2 and sent on to the appropriate cluster. 

b. Searching and Selection of Requests 

Fig. 3.23 shows the functions of CSRA and CSPE 
circuits of the central controller A. Four signals are shown 
in the top half of the figure representing three cluster 
requests from clusters B, C, D and from the cluster A of 
another star, respectively. The lower half of this figure 
shows the cluster or star grant signals to another star, 
cluster D, C and B, respectively. It is important to note 
that these CLPRN (or STPRN) signals do not overlap although 
the request signals do overlap. It can be seen that cluster 
C sent its CLREQ first and got its CLPRN. However, cluster 
D sent its CLREQ before cluster C finishes its request. Such 
an occasion is generally not allowed in real application 
because any SBC is allowed to transfer one word of data and 
must release the bus only if a software bus lock is ordered. 
However, this test is to challenge the ability of the central 
controller. In this case, the CSRA/CSPE of the CCA will allow 
the cluster A to complete its request period and then award 
a CLPRN to cluster D. This figure clearly demonstrated that 
with a mix of cluster request signals from three clusters and 
one star, some with overlap, some without overlap, the central 
controller is able to take in these requests, sort them out, 
select one at a time and award "cluster grant" appropriately. 
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Figure 3.22 Two Clocks In Central Controller For 

Searching/Selection and Synchronization 
of Requests From Stars and Clusters 



Four Request Signals To CSRA: 

From DCB 
From DCC 
From BCD 
From Star A 



Four Priority In Signals 
From CSPE: 

To Star A 
To BCD 
To BCC 
To BCB 



Figure 3.23 Bemonstration of the Functions of CSRA and 
CSPE Circuits in the Central Controller 
(Section A for Cluster A) 

Input to CSRA, Output from CSPE 
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Of course, this is not the completion of the intercommunica- 
tion task. The CLPRN will be sent to the distributed con- 
troller to initiate further control actions to complete the 
total task of communication between two SBCs. 

5 . Distributed Controller 

The function of the distributed controller is the 
same as that of the central controller. They must work with 
the RPC to complete the intercommunication. The central 
controller is located away from the Multibus and also controls 
the operations of all bus switches. The distributed control- 
ler is mounted on the Multibus. Therefore, we have four 
distributed controllers in a star. The important components 
of each distributed controller are: 

ICAAM (Intra-cluster advanced activities monitor) 

CIC (Coincidence inhibitor circuit) 

DAC (Deadlock avoidance circuit) 

Four figures will be used to demonstrate their operations. 

Eight control signals in the distributed controller are used 

in these figures. 

BREQ 

CLREQ* 

Internal/External Signal 

Inhibit 

PRE 

BHD 

CLPRN 

BPRN 

The first and eighth control signals, BREQ and BPRN, 
are two of the most important ones because they are directly 
connected to the SBCs. We must remember that all the buses. 
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switches, controllers are supporting circuits to help the 
SBCs to compute, to talk among themselves efficiently. The 
SBCs are the originators and receivers of the data and com- 
munication and control signals. 

a. Intra-Cluster Communication 

Fig, 3,24 shows the sequence of events in a test 
case where one SBC in a cluster wants to talk to another SBC 
in the same cluster. 

It can be seen that CLREQ* (second trace) is high, 
which means no request from another cluster. CLPRN (7th 
trace) is therefore also high, i.e., no cluster priority 
signal is granted by the central controller. 

It is interesting to notice the small delays 
between BREQ, PRE and BPRN. 

b. Inter-Cluster/Intra-Star Communication 

Fig. 3.25 shows the sequence of events in a test 
case where an SBC in one cluster wants to talk to an SBC in 
another cluster within the same star. 

There are several interesting points when this 
case is compared with the intra-cluster case: 

® Both BREQ and CLREQ* exist. 

° Inhibit signal is active to prevent any premature 
generation of BPRN. 

® CLPRN is also active to respond to the CLREQ*. 

It is clearly seen that this inter-cluster 
communication has been correctly handled by the distributed 
controller . 
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BREQ 

CLREQ 

INT/EXT 

INH 

PRE 

BHD 

CLPRN 

BPRN 



Figure 3.24 Eight Control Signals to Demonstrate 

The Function of Distributed Controller 
For Arbitration of Intra-Star and 
Intra-Cluster Communication 

BREQ 
CLREQ 
INT/EXT 
INH 

PRE 
BHD 
CLPRN 
BPRN 



Figure 3.25 Eight Control Signals to Demonstrate 

The Function of Distributed Controller 
For Arbitration of Intra-Star and 
Inter-Cluster Communication 



BREQ 

STREQ* 

INT/EXT 

INH 

PRE 

BHD 

STPRN 

BPRN 



Figure 3.26 Eight Control Signals to Demonstrate 

The Function of Distributed Controller 
For Arbitration of Inter-Star Commu- 
nication 
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c. Inter- Star Communication 

Figure 3,26 shows the sequence of events in a 

test case where an SBC in one cluster of a star wants to talk 

to an SBC in the corresponding cluster of a neighboring star. 

They are quite similar to the inter-cluster/intra-star case 

in Fig. 3.25 with several changes. 

The second trace is now the STREQ* instead of the 
CLREQ* signal. 

The seventh trace is now the STPRN signal instead 
of the CLPRN signal. 

The rest of the signals behave quite similarly. It shows 
that requests from a cluster in the same star and from a 
neighboring star are treated quite the same. 
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IV. IMPLEMENTATION OF ADAPTIVE FILTER 
ON MULTIPLE MICROCOMPUTER SYSTEM 



A. INTRODUCTION 

1 . Selection of Microcomputer 

The goal of this thesis research was to eliminate 
the gap between the theoretical development of image process- 
ing algorithms and the experimental development of their 
implementation on some processor systems which are good can- 
didates for practical applications. 

In this thesis, a multiple microcomputer system was 
chosen as the processor system candidate. 

It should be recognized that only during the past 
two to three years have 16 bit microcomputers been seriously 
considered for signal processing implementations. Although 
8 bit microcomputers have been investigated for performing 
signal processing operations, the motivations of these stud- 
ies are mainly to explore what can the 8 bit microcomputers 
do for signal processing. For serious implementations, bit 
slice microprocessors have always been the favored approach 
which can be designed to emulate 16 bit, 32 bit or even 
longer word computers. However, 16 bit microcomputers are 
being supported with more and more powerful hardware and 
software and are approaching low-end minicomputer performance. 

To examine the signal processing performance of 
today's 16 bit MOS microcomputer, we coded the statistical 
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3x3 spatial filter on one main frame computer, IBM 360/67 
and two 16 bit microcomputers, DEC LSI-11 and Intel 8612, 
using high order programming languages and single precision 
numerical data format. Fortran is used for the IBM and DEC 
computers. PLM86 is used for the Intel computer. The exe- 
cution times expressed in seconds are shown in Table IV. 1 
for comparison. 

TABLE IV. 1 

IMAGE PROCESSING EXECUTION TIME 
(in seconds) 



Image Processing Operations 


IBM 360/67 


DEC LSI -11 


Intel 8612 


Fortran 


Fortran 


PLM86 


Macro 


Single 

Precision 


Single 

Precision 


Single 

Precision 


Integer 


Spatial Statistics Calculation 


4.07 


25.46 


334.25 


0.72 


Spatial Filter Design 


0.0047 


0.24 


2.82 




Perform Spatial Filter 


0.98 


5.62 


79.8 


0.47 



It can be seen that LSI-11 has better floating point compu- 
tation support today than Intel's 8612 which took 13 to 14 times 
longer than the LSI-il to perform these image processing oper- 
ations.. The LSI-11 itself took approximately 6 times longer than 
the IBM 360/6.7. -Based on this comparison, the LSI-11 should 
be chosen as the 16 bit microcomputer candidate. However, 
Intel's 8612 was selected because of its larger physical 
memory addressing space and its system Multibus support which 
are much better suited for multiple microcomputer system 
development . 
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Further, two of the three spatial filter modules were 
coded in assembly language and a 32 bit integer data format 
on the 8612. It was found that the execution times are quite 
short, suggesting that even today's Intel 16 bit microcomputer, 
without the assistance of hardware arithmatic devices, can 
perform these rather sophisticated image processing operations 
very well if compared with the main frame computer IBM 360/67. 
More specifically, it took 0.72 seconds to compute the auto- 
correlation matrix elements for the 3x3 spatial filter, 
averaged over the 32 x 32 image, and 0.47 seconds to perform 
this 3x3 spatial filtering over the image. 

2 . Implementation 

In this chapter we will present the implementation 
results of our adaptive filter on the multiple microcomputer 
system. In Section B, the performance of spatial filters is 
discussed. In Section C, the performance of adaptive spatial 
filters will be discussed. 

The functions of various components of the intercon- 
nections and communication controllers have been described in 
previous sections using mainly signals generated by function 
generators. In this section, a test program was used to test 
and evaluate the data transfer behaviors of the system. This 
program is quite straightforward and fetches data from the RAM 
and displays them on a CRT terminal. However, the locations 
of the program and data are at different parts of the system 
to provide a thorough test of the data transfer and bus 
arbitration behaviors. 
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Three tests were made. 

The objectives of the first two tests are to measure 
the maximum rate of data transfer on the system bus. For 
this purpose, both the program and data were stored either 
in the global RAM located in another slave SBC, as in test 
case 1, or in the global RAM located in the pPRO RAM board. 
Therefore, the system bus was used very busily because not 
only the data must be fetched via the bus, the program itself 
must be read from the memory external to the testing SBC. 



TABLE IV. 2 

MEMORY ALLOCATION FOR MULTIBUS TEST 



Test No. 


Location of 
Program 


Location of 
Data 


Remarks 


1 


Slave SBC 


Slave SBC 


Program and data 
being run at maxi- 


2 


UPRO RAM 


yPRO RAM 


mum rate. 


3 


Master SBC 


yPRO RAM 


Program and data 
being run at approx- 
imately 20 % of the 
maximum rate. 



The maximum rates at which this test can run with 
one to six microcomputers are shown in Table IV. 3. Several 
important facts can be noticed. 

(1) The bus transfer rate of each SBC is reduced 
when more and more SBCs want to use the bus, as it should be. 

(2) However, the maximum rate and amount of reduc- 
tion vary from test to test. For example, in test 1, we 
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were able to transfer 710 Kbyte/sec at its maximum if only 
one SBC is using the bus as compared with a maximum of 911 
Kbyte/sec rate for one SBC in test case 2. Test 2 showed 
that it is quicker to get data out of the pPRO than the RAM 
on a different SBC. This can be explained easily because 
control on the SBC must decide whether the memory addressed 
is on-board or off-board. This decision takes time, thus it 
slows down the transfer rate. When more SBCs were added in 
these two tests, the transfer rate of every SBC was decreased. 
However, the rates of decrease were different in Test 1 and 
Test 2 as shown in Table IV. 3. They are also plotted in Fig. 
4.1 to give a graphical view. It is obvious that substantial 
deteriorations of the bus transfer rate took place in these 
two cases, from 710 Kbyte/sec to 144 Kbyte/sec in Test 1 and 
from 911 to 167.1 Kbyte/sec in Test 2. 

(3) It should be pointed out that such heavy 
usage of the system bus should be allowed to happen only 
during tests. If a programmer prepared an application pro- 
gram with such heavy bus usage, he has failed miserably in 
partitioning his program for parallel and pipeline computa- 
tion in the multiple microcomputer system. 

(4) Therefore, to provide a test more compat- 
ible with real operational conditions. Test 3 was prepared 
which has its program in the RAM of the master SBC and its 
data in the global RAM in yPRO. Further, it was run at a rate 
of 194.9 Kbyte/sec on the bus when only one SBC requested 
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Three test cases : 

Maximum bus transfer rate per microcomputer: 

1. Both program and data in a separate RAM 

board 

2. Both program and data in the RAM of 
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the bus. It can be seen that the deterioration of the system 
bus transfer rate is much more moderate, from 194.9 for one 
SBC to 132 Kbyte/sec for six SBCs . This is a testimony of 
the ability of the intercommunication controller in treating 
all SBCs equally without allowing any one SBC to dominate the 
bus usage. 



TABLE IV. 3 

SYSTEM BUS TRANSFER RATE (Kbyte/sec) FOR EVERY SBC IN 
THREE MULTIPLE MICROCOMPUTER SYSTEM TESTS 



No. of SBCs 


Test 1 


Test 2 


Test 3 


1 


710 


911 


194.9 


2 


400.7 


522 


188 


3 


277.7 


345.33 


184 


4 


212 


255.7 


166 


5 


171.8 


202. 3 


147.9 


6 


144 


167.1 


132 



(5) Further, the overhead loss of transfer rate 
in arbitrating the bus usage of several microcomputers is 
small. Let us consider Test Case 2. The maximum bus trans- 
fer rate took place when there were two SBCs using the bus 
and was 2 x 522 = 1044 Kbyte/sec. When six SBCs were using 
the bus, the total transfer rate on the bus was 6x167.1 = 
1002.6 Kbyte/sec. The loss is only (1044 - 1002 . 6) /1044 = 
0.0397, or 3.97%. Of course, each SBC suffered a loss of 
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C911 - 167.1)/911 = 81.6581 in its bus usage rate. It is 
interesting to note that 167.1 KBS for six SBCs is close to 
one-sixth of the rate of 911 KBS if one SBC has the system 
all to itself. 

B. IMPLEMENTATION OF 3 x 3 SPATIAL FILTERING ON 
MULTIPLE MICROCOMPUTER SYSTEM 

1 . Introduction 

Four different implementations were compared. 

They differed in the manner of storing the programs, variables 

and data in various parts of the memory hierarchy and some 

programming skills. For this development, all program and 

data were stored in RAM on the single board microcomputers. 

These RAIvi have been separated into two types: 

® Unshared RAIvl: They are "private" to the microcomputer 

where the RAM is located. 

° Shared RAl'4: They are "global" and can be accessed 

by other microcomputers on the same Multibus. 



TABLE IV. 4 

PROGRAM DATA AND VARIABLE ALLOCATION 



Implementation 


Program 


Variables 


Data 


Case 1 


Ideal Case 


Case 2* 


Unshared 


Unshared 


Shared 


Case 3 


Unshared 


Unshared 


Shared 


Case 4 


Unshared 


Shared 


Shared 


Case 5 


Shared 


Shared 


Shared 
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The results are presented in Fig. 4.2 which expresses the 
number of frames which can be performed on the 3x3 spatial 
filtering task per second as a function of the number of 
microcomputers used to partition the spatial filtering into 
parallel operations. It should be pointed out that the image 
size is 30 x 30 pixels. The partitioning is to split the 
image into equal parts for several microcomputers. 

The results will be discussed in the following. 

a. The first case is not a measured result. It 
represents the ideal enhancement of computation by using 
multiple microcomputers. We first measured the execution 
speed of performing a spatial filter over the whole image 
by one microcomputer with program, variables and data all 
in the private unshared RAM of the SBC. There was no bus 
usage, therefore no overhead due to bus communication. The 
maximum filtering speed is roughly two thousand pixels pro- 
cessed by this spatial filter per second. For more SBCs, 
we simply multiply the rate by the number of microcomputers 
and plotted a "linear enhancement" curve. This represents 
the ideal case and serves as the goal for our partitioning 
to approach. 

b. Let us start with the case of lowest performance. 
Case 5. In this case, all program, variables and data were 
located in the shared memory of another SBC, It obviously 
required the maximum amount of transfer and system bus usage. 
It can be seen that the performance saturated quite quickly. 
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Fie. 4.2 Performance of the Partitioning of a Spatial Filter 

In a Multiple Microcomputer System (Parallel Processing) 



We are obviously wasting the computational power of added 
microcomputers . 

c. Next, in Case 4, where the program was stored 

in the private memory of the computing SBC, but the variables 
and data were stored in the global memory of another SBC. 

The throughput performance improved almost linearly with 
respect to the number of microcomputers but at a rate lower 
than the "ideal linear enhancement." 

d. In Case 3, both the program and variables were 
stored in the unshared private RAM. But the data were stored 
in the global RAM of another SBC. Further improvement was 
accomplished. However, about 201 of the computing capability 
was lost because of the overhead for the arbitration of mul- 
tiple microcomputer requests. 

e. In Case 2, the locations of the program, varia- 
bles and data are the same as in Case 3, but the programming 
is more clever in the sense that the number of accesses to 

the system bus by each microcomputer is minimized and, further, 
the occurrences of these system bus accesses were distributed 
as evenly in time as possible. It can be seen that the en- 
hancement of total computing power is much closer to the total 
"ideal linear enhancement" case. 

f. In summary, we have used the special case of spa- 
tial filtering to explore the behavior and improvement of 
computing by the multiple microcomputer system. It should 

be pointed out that although there have been a lot of ideas 
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in this field, real experience is still very limited. Con- 
sequently, there is really no concensus in the philosophy, 
approaches and methodologies of effective partitioning for 
parallel and pipeline computing. This thesis is a first step 
in testing the uncharted water. We only used a spatial filter 
to test the parallel processing. We have not used a problem 
to test pipeline processing and combined parallel/pipeline 
processing yet. Therefore, we do not intend to declare that 
the experience learned from this spatial filtering established 
a general methodology for effective partitioning. 

But we feel that the following guidelines proba- 
bly will be helpful when more complex problems will be tested 
to develop a more thorough philosophy of partitioning: 

a) The bus usage should be minimized. 

b) The bus usage should be distributed more evenly 
in time. Concentration of bus usage should be 
avoided. 

g. Meanwhile, it should be pointed out that this 
implementation of spatial filtering is a test case based on 
a real computation problem. In addition to the experience 
learned for partitioning, the successful implementation of 
the spatial filtering involving up to five microcomputers in 
parallel processing convincingly proved that the random 
priority is working correctly. 
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V. CONCLUSION AND RECOMMENDATIONS 



A. CONCLUSION 

1. Motivation 

This thesis was motivated by the needs of new smart 
sensor developments. With the anticipation of new sensitive 
and large mosaic optical sensor arrays and very sophisticated 
signal/data processing capabilities to be offered by VLSI/ 
VHSIC electronics, very ambitious mission objectives of new 
surveillance, search/track and weapon guidance systems are 
being proposed and developed, which require new signal pro- 
cessing techniques to accomplish demanding goals. Further, 
they require very sophisticated processor systems which are 
powerful enough to implement the new signal processing 
algorithms and also small and light enough for mounting on 
platforms of practical systems. 

2. Single Objective and Dual Tasks 

This thesis has one single objective, to help to 
make the new "smart sensors" practical, but consists of two 
tasks to achieve this objective. 

a. Develop new adaptive filter techniques to process 
infrared images for enhancement of "target signal" 
to "background clutter noise" ratio, 

b. Develop a new multiple microcomputer system to 
implement this type of image processing. 
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Extensions and Contributions 



Both studies, although motivated by the development 
of "infrared smart sensors," are generic and can contribute 
to broader fields much beyond the image processing problems 
in infrared smart sensor systems. 

4 . Results I - Adaptive Filters 

The following results have been obtained: 

a. Adaptive filter research done in the past was 
surveyed. It was found that: 

° Practically all past research dealt with one dimen- 
sional problems, except one by B. Evenor who extended 
the LMS algorithm to images generated by Markov models. 

° Most approaches are based on LMS algorithms. 

b. In this thesis the LMS algorithm was extended to 
process real world infrared images. 

c. A new approach to nonrecursive adaptive filters 
was developed which is similar to searching for the extreme 
point in optimization problems. 

d. Two optimization criteria were considered: 

mMSE = minimization of mean square error 
MSNR = maximization of signal to noise ratio. 

e. Seven different optimization/searching techniques 
were developed: 

® Gradient approaches = | Steepest descent 

\ Accelerated steepest descent 
[^Amir's method (mMSE only) 



Conjugate gradient approaches = 

Variable metric approach - Davidon-Fletcher-Powell 



FI etcher -Reeves 
Pollack 
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° Amir’s transform approach (MSNR only) 

£. These approaches were tested on two infrared test 



images : 

® Indiana - Blue spike band infrared image appropriate 
for high altitude downward looking infrared sensor 
systems. 

° China Lake - 10-13 micron thermal band infrared image 
appropriate for shorter distance side-looking infrared 
sensor systems. 

The results are encouraging and showed that these new 
adaptive filters are effective in suppressing background clutter 
and enhancing the "target signal" to "clutter noise ratio." 

5. Results II - Multiple Microcomputer System 

a. The tightly-coupled multiple microcomputer research 
done in the past was surveyed. It was found that: 

® There are many conceptual designs of new multiple 
microcomputer systems. Only a very small number of 
these have embarked on actual developments with both 
hardware and software efforts. 

® More loosely coupled multiple microcomputer systems 

are being developed. They are mostly computer networks. 

° There are only two tightly coupled multiple micro- 
computer systems in operation today based on the 
survey of the open literature. Both are at Carnegie 
Mellon University: Cmmp and Cm*. It should be noted 

that although Cmmp is a multiple minicomputer system, 
today’s 16 bit microcomputers are fast approaching 
minicomputer performance. 

b. Based on an intensive consideration of the re- 
quirements of typical’ new smart sensor systems in not only 
the mission signal processing area but also in management, 
control, and communication areas, it was decided that a 
hierarchical architecture which supports simultaneous tightly 
and loosely coupled systems is attractive. 
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c. A multiple star, multiple cluster architecture 

using commercially developed 16 bit microcomputers was 
developed, A complete star bus switch network was developed 
which is managed by a control system consisting of three 
levels of control: random priority controller, distributed 

controller, central controller. 

d. The basic concept of this hardware architecture 
has been basically tested by simulated intercommunications. 
Extensive tests in real signal/data processing environments 
are awaiting the successful developments of operating systems. 

6 . ResultsIII - Implementation of Adaptive Spatial 
Filters on Microcomputers and Multiple 
Microcomputer Systems 

a. The spatial filter program was coded for one 
main frame, the IBM 360-67, and two 16 bit microcomputers: 
the DEC LSI-11 and one Intel 8612. The DEC LSI-11 has more 
mature floating point mathematics software and a hardware 
arithmetic IC chip, but is not as well suited for multiple 
microcomputer system development as the Intel 8612, whose 
floating point software is still very primitive. However, 
when coded in assembly language, the Intel 8612 performs 
the spatial filtering faster than the main frame coded in 
high order language. 

b. Implemented by using only one 16 bit 8612 micro- 
computer, the computation times for the 3x3 spatial filter 
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and a 32 x 32 image have been measured as follows: 



Spatial statistics computation = 0.72 sec. 

Adaptive spatial filter design = 1.0 sec. 

CConjugate gradient Pollack method) 

Perform spatial filtering = 0.47 sec. 

c. Several ways of using the multiple microcomputer 
implementation by placing program, variables and data in the 
unshared private RAM and/or the shared global RAM have been 
investigated. 

It was found that the best enhancement of total 
execution speed of the spatial filtering is to use more micro- 
computers by storing the program and variables in the private 
RAM and the data in the global RAM. The image data is not 
moved into the microcomputer all at once.* Instead, the data 
is moved, one at a time, into the private RAM of the micro- 
computer only moments before it is needed for processing. 

B. RECOMMENDATION 

1 . General 

Both topics covered in this thesis are quite new. 

This research only opens the gate a little into two fields 
worthy of more investigations. Although this thesis is con- 
cerned mainly with the image processing developments and 
their implementations for infrared smart sensors, the tech- 
niques developed are generic and can be applied to much 
broader fields beyond smart sensors. 



245 



2 . Adaptive Filters 

The new techniques based on the concepts of gradient, 
optimization search can be applied to most of the adaptive 
filter research done in the past using the LMS algorithm. 

For adaptive image processing applications, they 
should be used to develop adaptive temporal filters if a 
series of successive frames of images are rather well regis- 
tered spatially from frame to frame, although there may be 
drift, jitter, rotations, etc. between frames. 

Testing of these adaptive filters using more challeng- 
ing real world images which have serious non- stationarity 
should be performed to give the adaptive filtering techniques 
some tough challenges. Jamming and interference noises should 
be considered. The convergence time of the compiled adaptive 
filter programs should be measured to obtain relative speed 
of convergence of all the adaptation methods. Adaptive fil- 
ters for extended targets should be developed. 

3. Multiple Microcomputer System 

Although the subject of multiple microcomputer systems 
is not new, there are many unresolved questions that have 
hardly been touched because of the extensive effort required 
to make any type of multiple microcomputer system operational. 
Only two such systems are known to be working today, Cmmp and 
Cm*, although many system architectures have been proposed 
and conceptualized, A small number of these have been simu- 
lated. A smaller number of them are being emulated. An even 
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smaller number of them are being built. Simulations and 
modeling used today for multiple microcomputer systems must 
be carefully and critically scrutinized for their validity 
and usefulness. It is extremely important to examine how 
the intercommunication overhead is modeled and simulated. 

There is very little first-hand experience in existence today. 

Therefore, a wide variety of problems associated with 
the new multiple microcomputer systems must be researched, 
examined and answered. 

This thesis contributed to the formulation, design, fab- 
rication and test of a multiple microcomputer system which 
can be used - 

1. Not only for developing effective ways of implementing 
smart sensor image processing, in general, and the adaptive 
image processing, in particular, 

2. But also as a test bed to develop, verify, and improve 
several basic issues of multiple microcomputer systems. In- 
cluded were considerations of: 

a. Effective and alternative intercommunication for 
combined tightly and loosely coupled systems. 

b. Effective and alternative operating systems for 
real time signal processing, multi-tasking, multi-users, 
security, dynamic reconfiguration and fault tolerance. 

c. Effective and alternative programming methodologies 
for partitioning a given problem into a number of modules suit- 
able for combined pipeline and parallel implementation on 
multiple microcomputer systems. 



247 



d. Effective and alternative ways of using the dis- 
tributed capabilities of multiple microcomputer systems for 
fault tolerance, self -maintenance error recovery. 
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